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I/} (57) Abstract: The present invention provides compositions and methods for producing fusion proteins that comprise an amino acid 
© sequence tag. The amino acid sequence tag may be an amino acid sequence that is capable of being post-transiationally modified; 
^ for example, the amino acid sequence may be an amino acid sequence that is capable of being bio tiny lated. The amino acid sequence 

tag may also be an amino acid sequence that is recognized by an antibody (or fragment thereof) or other specific interacting reagent. 
© The invention includes isolated nucleic acid molecules comprising one or more nucleic acid sequences which encode an amino 
J~ acid sequence tag. The nucleic acid molecules of the invention may also comprise one or more recombination sites and/or one or 

more topdisomerase recognition sites and/or one or more topoisomerases. The nucleic acid molecules of the invention can be used 
^ in recombinational cloning and/or topoisomerase-mediated cloning methods in order to produce polynucleotide constructs which 

encode fusion proteins that comprise an amino acid sequence tag. Also provided are host cells, kits and compositions comprising 
^ the nucleic acid molecules of the invention. 
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METHODS AND COMPOSITIONS FOR THE PRODUCTION, 
IDENTIFICATION AND PURIFICATION OF FUSION PROTEINS 

BACKGROUND OF THE INVENTION 
Field of the Invention 

[0001] The present invention relates to compositions and methods for 

producing fusion proteins. More specifically, the invention relates to 
compositions and methods for producing fusion proteins that comprise an 
amino acid sequence tag. Exemplary amino acid sequence tags include amino 
acid sequences that are capable of being post-translationally modified, and 
amino acid sequences that are capable of being recognized by an antibody (or 
fragment thereof) or other specific binding reagent. 

[0002] The invention relates to nucleic acid molecules that can be used in 

recombinational cloning methods and/or topoisomerase-mediated cloning 
methods to produce polynucleotide constructs that encode fusion proteins, 
e.g., fusion proteins that comprise one or more amino acid sequence tags. The 
invention also relates to methods for producing fusion proteins in a variety of 
prokaryotic and eukaryotic cell types. The invention also relates to methods 
for identifying and purifying fusion proteins by utilizing, e.g., binding 
molecules and compositions that bind specifically to the fusion protein. 

Related Art 

[0003] Many areas of biotechnology and molecular biology rely on the 

production and purification of recombinant proteins. When recombinant 
proteins are produced in vivo they are generally produced in addition to a wide 
variety of endogenous proteins and other macromolecules in a host cell. 
Various strategies are employed to isolate and/or identify recombinant 
proteins from the cellular milieu. One strategy is to produce a fusion protein 
which comprises the protein of interest joined to an amino acid sequence tag. 

[0004] When a fusion protein is produced that comprises a tag that is capable 

of being post-translationally modified, the post-translational modification can 
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be exploited to isolate or identify the fusion protein, especially when (a) very 
few or no endogenous proteins or molecules contain the same post- 
translational modification in the host cell, and (b) a molecule is available 
which is capable of physically interacting with the post-translationally 
modified protein. 

[0005] One particular post-translational modification that has been used to 

isolate and/or identify recombinant fusion proteins is biotinylation. For 
instance, a fusion protein can be produced which comprises a protein of 
interest joined to an amino acid sequence to which a biotin moiety can be 
covalently bound. The biotinylation reaction will occur in vivo, i.e., in the 
host cell. The biotinylated fusion protein can then be isolated from the 
endogenous components of the host cell by providing a molecule that interacts 
specifically with the biotin moiety. Usually, the biotin-interacting molecule 
will be bound to a bead or other solid support which can be easily separated 
from the rest of the cellular components. 
[0006] Amino acid sequences which are capable of being biotinylated include, 

for example, a domain the 1.3S subunit of Propionibacterium shermanii 
transcarboxylase (PSTCD) that is naturally biotinylated at lysine 89 of the 
domain. (Cronan, J.E., /. Biol Chem. 265:10327-10333 (1990); Murtif, V.L., 
et aL,Proc. Natl. Acad. Sci. USA 52:5617-5621 (1985)). Another example is 
a 72 amino acid peptide derived from the C-terminus (amino acids 524-595) of 
the Klebsiella pneumoniae oxalacetate decarboxylase a subunit. (Schwarz, E. 
et aL, J. Biol. Chem. 2(53:9640-9645 (1988)). Fusion proteins containing 
biotinylation domains have been shown to be biotinylated by endogenous 
biotinylation components in bacteria, yeast and mammalian cells. (Cronan, 
J.E., J. Biol Chem. 265:10327-10333 (1990); Jank, M.M. et aL, Protein Expr. 
Purif. 77:123-127 (1999); Parrott, M.B. and Barry, M.A., Biochem. Biophys. 
Res. Comm. 257:993-1000 (2001); Parrott, M.B. and Barry, M.A., Molecular 
Therapy 7:96-104 (2000); U.S. Patent No. 5,252,466 and references cited 
therein). 
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[0007] 



Avidin has been shown to interact very strongly with biotin. The non- 



covalent interaction between avidin and biotin represents one of the strongest 
and most specific interactions commonly used in molecular biology. The 
interaction between avidin and biotin is estimated to have an affinity 
coefficient of 1CT 14 to 10" 15 , which is several orders of magnitude greater than a 
typical antibody-antigen interaction. (Rosano, C. et aL, BiomoL Eng. 75:5-12 
(1999); Green, N.M., Methods Enzyniol. 184:51-61 (1990); Airenne, KJ. et 
al, Protein Expr. Purif. 77:139-145 (1999); Wilchek, M. and Bayer, E.A., 
Methods EnzymoL 184:5 -13 (1990)). Avidin analogs, including streptavidin 
are also available for specifically interacting with biotin. 
[0008] As an alternative to producing a protein or polypeptide that is capable 

of being post-translationally modified, it is sometimes useful to produce a 
fusion protein that comprises an amino acid sequence that is identifiable by 
particular reagents, including, e.g., antibodies (or fragments thereof) or other 
binding compounds that can recognize certain polypeptides or amino acid 
sequences. 

[0009] In order to produce a recombinant fusion protein that comprises a 

particular amino acid sequence tag, a nucleic acid molecule must first be 
constructed which encodes the desired fusion protein. The construction of the 
recombinant nucleic acid molecule will generally involve the attachment of at 
least two individual nucleotide sequences: (1) a sequence encoding the protein 
of interest, and (2) a sequence encoding an amino acid sequence tag. 

[0010] Multiple nucleic acid sequences can be joined using conventional in 

vitro cloning methods which employ restriction endonucleases and DNA 
ligation enzymes. More rapid and efficient methods are available, however, 
which involve site-specific recombination and/or topoisomerase-mediated 
joining of nucleic acid sequences. Recombinational and topoisomerase- 
mediated cloning methods have been described in detail elsewhere. (Hartley, 



J.L., et al. y Genome Res. 70:1788-1795 (2000); Shuman, S., /. Biol Chem. 
.^0:32678-32684 (1994); Shuman, S., Proc. Natl. Acad ScL USA 55:10104- 



ll!lp8 (1991); U.S. Patent Nos. 5,851,808, 5,888,732, 6,143,557, 6,171,861, 
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6,270,969, 6,277,608 and 6,410,317; and commonly owned, co-pending U.S. 
Patent Application No. 10/005,876 (filed 12/07/01)). 
[00111 Briefly, recombinational cloning, specifically the Gateway™ Cloning 

System (available from Invitrogen Corporation), utilizes vectors that contain at 
least one and preferably at least two different site-specific recombination sites 
based on the bacteriophage lambda system (e. g., att\ and attl) that are 
mutated from the wild type (aftO) sites. Each mutated site has a unique 
specificity for its cognate partner att site of the same type (for example attBl 
with aflPl, or a«Ll with affRl) and will not cross-react with recombination 
sites of the other mutant type or with , the wild-type attO site. Nucleic acid 
fragments flanked by recombination sites are cloned and subcloned using the 
Gateway™ system by replacing a selectable marker (for example, ccdB) 
flanked by att sites on the recipient plasmid molecule, sometimes termed the 
Destination Vector. Desired clones are then selected by transformation of a 
ccdB sensitive host strain and positive selection for a marker on the recipient 
molecule. Similar strategies for negative selection (e.g., use of toxic genes) 
can be used in other organisms such-as thymidine kinase (TK) in mammals 
and insects. Other recombinational cloning systems are available such as, e.g., 
Echo™ (Invitrogen Corporation) and Creator (Clontech). 
[0012] Topoisomerase cloning can be used to generate a double-stranded 

recombinant nucleic acid molecule covalently linked in one strand. This 
method can be performed by contacting a first nucleic acid molecule which 
has a site-specific topoisomerase recognition site (e.g., a type IA or a type II 
topoisomerase recognition site), or a cleavage product thereof, at a 5' or 3' 
terminus, with a second (or other) nucleic acid molecule, and optionally, a 
topoisomerase (e.g., a type IA, type B, and/or type H topoisomerase), such 
that the second nucleotide sequence can be covalently attached to the first 
nucleotide sequence. Topoisomerase cloning can also be used to generate a 
double-stranded recombinant nucleic acid molecule covalently linked in both 
strands. This method can be performed, for example, by contacting a first 
nucleic acid molecule having a first end and a second end; wherein, at the first 
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end or second end or both, the first nucleic acid molecule has a topoisomerase 
recognition site (or cleavage product thereof) at or near the 3 1 terminus; at least 
a second nucleic acid molecule having a first end and a second end, wherein, 
at the first end or second end or both, the at least second double stranded 
nucleotide sequence has a topoisomerase recognition site (or cleavage product 
thereof) at or near a 3 1 terminus; and at least one site specific topoisomerase 
(e.g., a type IA and/or a type IB topoisomerase), under conditions such that all 
components are in contact and the topoisomerase can effect its activity. A 
covalently linked double-stranded recombinant nucleic acid by this method is 
characterized, in part, in that it does not contain a nick in either strand at the 
position where the nucleic acid molecules are joined. The method may be 
performed by contacting a first nucleic acid molecule and a second (or other) 
nucleic acid molecule, each of which has a topoisomerase recognition site, or a 
cleavage product thereof, at the 3' termini or at the 5 1 termini of two ends to be 
covalently linked. Alternatively, the method can be performed by contacting a 
first nucleic acid molecule having a topoisomerase recognition site, or 
cleavage product thereof, at the 5' terminus and the 3* terminus of at least one 
end, and a second (or other) nucleic acid molecule having a 3 1 hydroxyl group 
and a 5' hydroxyl group at the end to be linked to the end of the first nucleic 
acid molecule containing the recognition sites. Topoisomease cloning methods 
can be performed using any number of nucleic acid molecules having various 
combinations of termini and ends. 
[0013] Cloning schemes are also available which use both recombinational 

cloning and topoisomerase cloning methods. Such methods may involve first 
joining two nucleic acid sequences using recombinational cloning to create a 
product nucleic acid molecule, followed by joining the product nucleic acid 
molecule to another nucleic acid molecule using topoisomerase cloning. 
Conversely, two nucleic acid molecules may joined, first, by using 
topoisomerase cloning to create a product nucleic acid molecule, followed by 
joining the product nucleic acid molecule to another nucleic acid molecule 
using recombinational cloning. . 
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[0014] Recombinational cloning methods, topoisomerase cloning methods, 

and combinations thereof, heretofore have not been described in the art for 
producing nucleic acid constructs that encode fusion proteins that comprise 
one or more amino acid sequence tags. Accordingly, a need exists in the art 
for rapid and efficient compositions and methods that enable the production of 
nucleic acid molecules which encode fusion proteins. 

BRIEF SUMMARY OF THE INVENTION 

[0015] The present invention satisfies the aforementioned need in the art by 

providing compositions and methods for producing fusion proteins which 
comprise one or more amino acid sequences of interest and one or more amino 
acid sequence tags. An "amino acid sequence tag," as used herein, includes, 
e.g., amino acid sequences that are capable of being post-translationally 
modified, and/or amino acid sequences that are capable of being recognized by 
an antibody (or fragment thereof) or other specific binding reagent. 
[0016] The invention includes isolated nucleic acid molecules comprising one 

or more nucleic acid sequences which encode an amino acid sequence tag. 
The isolated nucleic acid molecules of the invention may further comprise one 
or more recombination sites. Alternatively or additionally, the isolated nucleic 
acid molecules of the invention may further comprise one or more 
topoisomerase recognition sites and/or one or more topoisomerases. Thus, in 
certain embodiments, the invention includes isolated nucleic acid molecules 
comprising: (a) one or more recombination sites; (b) one or more 
topoisomerase recognition sites and/or one or more topoisomerases; and (c) 
one or more nucleic acid sequences which encode an amino acid sequence tag. 
[0017] In addition to the aforementioned elements, the nucleic acid molecules 

of the invention may further comprise additional elements. Exemplary 
additional elements that may be included within the nucleic acid molecules of 
the invention include, e.g., one or more promoters, one or more operators, one 
or more enhancers, one or more ribosome binding sites, one or more initiation 
codons, one or more nucleic acid sequences that encodes an amino acid 
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sequence that is capable of being cleaved by one or more proteases, one or 
more nucleic acid sequences of interest (e.g., one or more nucleic acid 
sequences that encode one or more proteins or polypeptides of interest), one or 
more polyadenylation signals and/or one or more transcription termination 
regions. As understood by those skilled in the art, other elements may be 
included within the nucleic acid molecules of the invention depending on the 
circumstances under which the nucleic acids may be used. 
[0018] In a preferred embodiment, the elements of the isolated nucleic acid 

molecules of the invention are arranged relative to one another such that a 
nucleic acid sequence of interest can be attached to the nucleic acid molecules 
of the invention, thereby producing a polynucleotide construct that encodes a 
fusion protein, the fusion protein comprising: (i) an amino acid sequence tag; 
and (ii) the amino acid sequence encoded by said nucleic acid sequence of 
interest. The fusion protein may be, e.g., an N-terminal fusion protein (e.g., 
wherein an amino acid sequence tag is covalently attached at or near the N- 
terminus of the amino acid sequence encoded by said nucleic acid sequence of 
interest). The fusion protein may also be, e.g., a C-terminal fusion protein 
(e.g., wherein an amino acid sequence tag is covalently attached at or near the 
C-terminus of the amino acid sequence encoded by said nucleic acid sequence 
of interest). The fusion protein may also be, e.g., an N-terminal and C-terminal 
fusion protein (e.g., wherein an amino acid sequence tag is covalently attached 
at or near the N-terminus of the amino acid sequence encoded by said nucleic 
acid sequence of interest and an amino acid sequence tag is covalently 
attached at or near the C-terminus of the amino acid sequence encoded by said 
nucleic acid sequence of interest). 

[0019] The invention also includes nucleic acid molecules that are created 

following the attachment of a nucleic acid sequence of interest to a nucleic 
acid molecule comprising: (a) a nucleic acid sequence that encodes an amino 
acid sequence tag; and/or (b) one or more recombination sites; and/or (c) one 
or more topoisomerase recognition sites and/or one or more topoisomerases. 

[0020] In order to produce a polynucleotide sequence that encodes a fusion 

protein that comprises one or more amino acid sequence tags, a nucleic acid 
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sequence of interest may, for example, be inserted at or within 20 nucleotides 
of said one or more recombination sites. The nucleic acid sequence may also 
be inserted at or within 20 nucleotides of said one or more topoisomerase 
recognition sites and/or at or within 20 nucleotides of the position of said one 
or more topoisomerases in order to produce a polynucleotide sequence that 
encodes a fusion protein that comprises an amino acid sequence tag. 
[0021] The nucleic acid molecules of the invention may further comprise a 

nucleic acid sequence that encodes an amino acid sequence that is capable of 
being cleaved by one or more proteases. The position of such a nucleic acid 
sequence, relative to the other elements of the nucleic acid molecules of the 
invention, will be such that, a nucleic acid sequence of interest can be attached 
to the nucleic acid molecules of the invention, thereby producing a 
polynucleotide construct that encodes a fusion protein, the fusion protein 
comprising: (i) said amino acid sequence that is capable of being cleaved by 
one or more proteases, flanked on one side by (ii) the amino acid sequence tag, 
and on the other side by (iii) the amino acid sequence encoded by the amino 
acid sequence of interest. 
[0022] In* certain embodiments, the nucleic acid sequence that encodes an 

amino acid sequence tag may be, e.g., a nucleic acid sequence that encodes an 
amino acid sequence that is capable of being post-translationally modified. 
For example, the nucleic acid sequence may be a nucleic acid sequence which 
encodes an amino acid sequence that is capable of being post-translationally 
modified by, e.g., biotinylation, attachment of 4-phosphopanthetheine, 
attachment of lipoic acid, attachment of flavins, etc. In a preferred 
embodiment, the amino acid sequence is capable of being biotinylated. An 
exemplary nucleic acid sequence that encodes a protein or polypeptide having 
an amino acid sequence that is capable of being biotinylated is an amino acid 
sequence which encodes a portion of the C-terminus of the Klebsiella 
pneumoniae oxalacetate decarboxylase a subunit, e.g., an amino acid 
sequence known as the Biotag™. 
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[0023] In certain other embodiments, the nucleic acid sequence that encodes 

an amino acid sequence tag may be, e.g., a nucleic acid sequence which 
encodes an amino acid sequence that is capable of being recognized by an 
antibody (or fragment thereof) or other specific binding reagent. Such amino 
acid sequences are known in the art and include, e.g., a 6-Histidine tag, an 
epitope tag (e.g., an amino acid sequence recognized by a specific antibody (or 
fragment thereof) such as, e.g., the FLAG tag, the Myc tag, the HA tag, etc.) 
Thus, the nucleic acid molecules of the invention can, in some embodiments, 
be used to produce fusion proteins comprising: (i) an amino acid sequence 
which encodes an amino acid sequence that is capable of being recognized by 
a specific antibody (or fragment thereof) or other compound or reagent, and 
(ii) an amino acid sequence encoded by a nucleotide sequence of interest. 

[0024] The invention also includes methods for producing polynucleotide 

constructs that encode fusion proteins that comprise one or more amino acid 
sequence tags. In certain embodiments, the invention generally includes 
methods of attaching a first nucleic acid molecule (e.g., a nucleic acid 
molecule which has a nucleotide sequence which encodes a particular protein 
or polypeptide of interest) to a second nucleic acid molecule which comprises 
one or more nucleic acid sequence tags. The attachment of the first nucleic 
acid molecule to the second nucleic acid molecule may be accomplished by, 
e.g., recombination (e.g., recombinational cloning) and/or by topoisomerase- 
mediated cloning. The attachment of the first nucleic acid molecule to the 
second nucleic acid molecule will preferably result in a product polynucleotide 
construct which encodes a fusion protein, said fusion protein comprising: (i) 
the amino acid sequence tag; and (ii) the amino acid sequence encoded by the 
nucleotide sequence of the first nucleic acid molecule. 

[0025] The invention also includes methods of producing fusion proteins that 

comprise one or more amino acid sequence tags. Also included are methods 
for producing fusion proteins that can be purified, concentrated or otherwise 
identified. The methods, according to this aspect of the invention, may 
comprise: (a) obtaining a host cell comprising a polynucleotide construct that 
encodes a fusion protein that comprises one or more amino acid sequence tags, 
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said polynucleotide construct produced according to a method of the 
invention; and (b) culturing said host cell under conditions wherein said fusion 
protein is produced by said host cell. The methods of the invention may 
further comprise culturing said host cell under conditions wherein said fusion 
protein is post-translationally modified in said host cell. In other embodiments 
of this aspect of the invention, the methods further comprise: (a) causing said 
fusion protein to be released from said host cell or treating said host cell such 
that said fusion protein is released from said host cell; and (b) contacting said 
fusion protein with a detecting composition comprising a molecule that is 
capable of interacting specifically with said fusion protein. 

[0026] In certain exemplary embodiments, said fusion protein is a fusion 

protein that has been post-translationally modified, e.g., a biotinylated fusion 
protein, and said detecting composition comprises avidin, streptavidin, or 
analogs and derivatives thereof. 

[0027] The invention further comprises vectors comprising the nucleic acid 

molecules of the invention, host cells comprising the nucleic acid and/or 
vectors of the invention, and kits comprising the nucleic acid molecules, 
vectors, and/or host cells of the invention. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0028] Fig. 1 is a map which shows the general characteristics of pET104- 

DEST. 

[0029] Figs. 2A-2C show the nucleotide sequence of pETl 04-DEST (SEQ ID 

NO:l). 

[0030] Fig. 3 is a map which shows the general characteristics of 

pET104/GW//acZ. 

[0031] Fig. 4 is a map which shows the general characteristics of pET104/D- 

TOPO. 

[0032] Figs. 5 A-5B show the nucleotide sequence of pET104/D-TOPO (SEQ 

IDNO:2). 
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[0033] Fig. 6 is a map which shows the general characteristics of 

pET104/D//acZ. 

[0034] Fig. 7 is a map which shows the general characteristics of 

pcDNA6/Biotag™-DEST. 
[0035] Figs. 8A-8B show the nucleotide sequence of pcDNA6/Biotag™-DEST 

(SEQIDNO:3). 

[0036] Fig. 9 is a map which shows the general characteristics of 

pcDNA6/Biotag™-GW//acZ. 
[0037] Fig. 10 is a map which shows the general characteristics of 

pcDNA6/Biotag™/D-TOPO. 
[0038] Figs. 11A-11B show the nucleotide sequence of pcDNA6/Biotag™/D- 

TOPO(SEQIDNO:4). 
[0039] Fig. 12 is a map which shows the general characteristics of 

pcDNA6/Biotag™//acZ. 
[0040] Fig. 13 is a map which shows the general characteristics of 

pMT/Biotag™-DEST. 
[0041] Figs. 14A-14B show the nucleotide sequence of pMT/Biotag™-DEST 

(SEQIDNO:5). 

[0042] Fig. 15 is a map which shows the general characteristics of 

P MT/Biotag™/GW-focZ. 
[0043] Fig. 16 is a depiction of the recombination region of the expression 

clone resulting from pET104-DEST x entry clone, showing the nucleotide 

sequence of the recombination region (SEQ ID NO:25) and the amino acid 

sequence encoded therefrom (SEQ ID NO:26). 
[0044] Fig. 17 is a schematic representation of the mechanism by which 

TOPO cloning is accomplished. 
[0045] Fig. 18 is a flow-chart describing the general steps required for cloning 

and expressing a blunt-end PCR product using pET104/D-TOPO. 
[0046] Fig. 19 is a depiction of a region of the pET104/D-TOPO vector 

surrounding the Biotag™, showing the nucleotide sequence of the region (SEQ 

ED NO:27) and the amino acid sequence encoded therefrom (SEQ ID NO:28). 
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[00471 Fig. 20 is a depiction of the recombination region of the expression 

clone resulting from pcDNA6/Biotag™-DEST x entry clone, showing the 
nucleotide sequence of the recombination region (SEQ ID NO:29) and the 
amino acid sequence encoded therefrom (SEQ ID NO:30). 

[0048] Fig. 21 is a flow-chart describing the general steps required for cloning 

and expressing a blunt-end PCR product using pcDNA6/Biotag™/D-TOPO. 

[0049] Fig. 22 is a depiction of a region of the pcDNA6/Biotag™/D-TOPO 

vector surrounding the Biotag™, showing the nucleotide sequence of the 
region (SEQ ID NO:31) and the amino acid sequence encoded therefrom 
(SEQIDNO:32). 

[00501 Fig 23 is a depiction of the recombination region of the expression 

clone resulting from pMT/Biotag™-DEST x entry clone, showing the 
nucleotide sequence of the recombination region (SEQ ID NO:33) and the 
amino acid sequence encoded therefrom (SEQ ID NO:34). 

[00511 Fig. 24 is a map which shows the general characteristics of pCoHygro. 

[00521 Fig. 25 is a map which shows the general characteristics of pCoBlast. 

» 

DETAILED DESCRIPTION OF THE INVENTION 

[00531 The present invention relates generally to compositions and methods 

for producing nucleic acid molecules which encode fusion proteins, e.g., 
fusion proteins that comprise one or more amino acid sequence tags. The 
invention also relates to methods for producing, purifying, concentrating and 
isolating fusion proteins using the compositions and methods described herein. 

[0054] The invention relates to nucleic acid molecules comprising: (a) one or 

more recombination sites; and (b) one or more nucleic acid sequences which 
encode one or more amino acid sequence tags. 

[0055] The invention also relates to isolated nucleic acid molecules 

comprising: (a) one or more topoisomerase recognition sites and/or one or 
more topoisomerases; and (b) one or more nucleic acid sequences which 
encode one or more amino acid sequence tags. 
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[0056] The invention also relates to isolated nucleic acid molecules 

comprising: (a) one or more recombination sites; (b) one or piore 
topoisomerase recognition sites and/or one or more topoisomerases; and (c) 
one or more nucleic acid sequences which encode one or more amino acid 
sequence tags. 

[0057] The nucleic acid molecules of the invention may be circular molecules, 

or they may be linear molecules. 

[0058] As used herein, a nucleotide is a base-sugar-phosphate combination. 

Nucleotides are monomeric units of a nucleic acid molecule (DNA and RNA). 
The term nucleotide includes ribonucleoside triphosphates ATP, UTP, CTG, 
GTP and deoxyribonucleoside triphosphates such as dATP, dCTP, dTTP, 
dUTP, dGTP, dTTP, or derivatives thereof. Such derivatives include, for 
example, [(S]dATP, 7-deaza-dGTP and 7-deaza-dATP. The term nucleotide 
as used herein also refers to dideoxyribonucleoside triphosphates (ddNTPs) 
and their derivatives. Illustrated examples of dideoxyribonucleoside 
triphosphates include, but are not limited to, ddATP, ddCTP, ddGTP, ddTTP, 
and ddTTP. According to the present invention, a "nucleotide" may be 
unlabeled or detectably labeled by well known techniques. Detectable labels 
include, for example, radioactive isotopes, fluorescent labels, 
chemiluminescent labels, bioluminescent labels and enzyme labels. 

[0059] As used herein, a nucleic acid molecule is a sequence of contiguous 

nucleotides (riboNTPs, dNTPs or ddNTPs, or combinations thereof) of any 
length which may encode a full-length polypeptide or a fragment of any length 
thereof, or which may be non-coding. As used herein, the terms "nucleic acid 
molecule" and "polynucleotide" and "polynucleotide construct 5 ' may be used 
interchangeably. 

[0060] Polymerases for use in the invention include but are not limited to 

polymerases (DNA and RNA polymerases), and reverse transcriptases. DNA 
polymerases include, but are not limited to, Theimus thermophilus (Tth) DNA 
polymerase, Thermus aquaticus (Taq) DNA polymerase, Thermotoga 
neopolitana (Tne) DNA polymerase, Thermotoga maritima (Tma) DNA 
polymerase, Thermococcus litoralis (Hi or VENT™) DNA polymerase, 
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Pyrococcus furiosus (Pfu) DNA polymerase, DEEP VENT™ DNA 
polymerase, Pyrococcus woosii (Pwo) DNA polymerase, Pyrococcus sp 
KOD2 (KOD) DNA polymerase, Bacillus sterothermophilus (Bst) DNA 
polymerase, Bacillus caldophilus (Bca) DNA polymerase, Sulfolobus 
acidocaldarius (Sac) DNA polymerase, Thermoplasma acidophilum (Tac) 
DNA polymerase, Thermus flavus (Tfl/Tub) DNA polymerase, Thermus ruber 
(Tru) DNA polymerase, Thermus brockianus (DYNAZYME™) DNA 
polymerase, Methanobacterium thermoautotropbicum (Mth) DNA 
polymerase, mycobacterium DNA polymerase (Mtb, Mlep), E. coli pol I DNA 
polymerase, T5 DNA polymerase, T7 DNA polymerase, and generally pol I 
type DNA polymerases and mutants, variants and derivatives thereof. RNA 
polymerases such as T3, T5, T7 and SP6 and mutants, variants and derivatives 
thereof may also be used in accordance with the invention. 
) The nucleic acid polymerases used in the present invention may be 

mesopbilic or thermophilic, and are preferably thermophilic. Preferred 
mesophilic DNA polymerases include Pol I family of DNA polymerases (and 
their respective Klenow fragments) • any of which may be isolated from 
organism such as E. coli, H. influenzae, D. radiodurans, H. pylori, C. 
aurantiacus, R. prowazekii, T.pallidum, Synechocystis sp., B. subtilis, L. 
lactis, S. pneumoniae, M. tuberculosis, M. leprae, M. smegmatis, 
Bacteriophage L5, phi-C31 , T7, T3, T5, SP01, SP02, mitochondrial from S. 
cerevisiae MIP-1, and eukaryotic C. elegans, and D. melanogaster (Astatke, 
M. et al., 1998, J. Mol. Biol. 278, 147-165), pol m type DNA polymerase 
isolated from any sources, and mutants, derivatives or variants thereof, and the 
like. Preferred thermostable DNA polymerases that may be used in the 
methods and compositions of the invention include Taq, Tne, Tma, Pfu, KOD, 
Tfl, Tth, Stoffel fragment, VENT™ and DEEP VENT™ DNA polymerases, 
and mutants, variants and derivatives thereof (U.S. Patent No. 5,436,149; U.S. 
Patent 4,889,818; U.S. Patent 4,965,188; U.S. Patent 5,079,352; U.S. Patent 
5,614,365; U.S. Patent 5,374,553; U.S. Patent 5,270,179; U.S. Patent 
5,047,342; U.S. Patent No. 5,512,462; WO 92/06188; WO 92/06200; WO 
96/10640; WO 97/09451; Barnes, W.M., Gene 112:29-35 (1992); Lawyer, 
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F.C., et aL, PCR Meth. Appl. 2:275-287 (1993); Flaman, J.-M, et al., Nucl. 
Acids Res. 22(15):3259-3260 (1994)). 
[0062] Reverse transcriptases for use in this invention include any enzyme 

having reverse transcriptase activity. Such enzymes include, but are not 
limited to, retroviral reverse transcriptase, retrotransposon reverse 
transcriptase, hepatitis B reverse transcriptase, cauliflower mosaic virus 
reverse transcriptase, bacterial reverse transcriptase, Tth DNA polymerase, 
Taq DNA polymerase (Saiki, R.K., et al, Science 239:487-491 (1988); U.S. 
Patent Nos. 4,889,818 and 4,965,188), Tne DNA polymerase (WO 96/10640 
and WO 97/09451), Tma DNA polymerase (U. S. Patent No. 5,374,553) and 
mutants, variants or derivatives thereof (see, e.g., WO 97/09451 and WO 
98/47912). Preferred enzymes for use in the invention include those that have 
reduced, substantially reduced or eliminated RNase H activity. By an enzyme 
"substantially reduced in RNase H activity 1 is meant that the enzyme has less 
than about 20%, more preferably less than about 15%, 10% or 5%, and most 
preferably less than about 2%, of the RNase H activity of the corresponding 
wildtype or RNase H* enzyme such as wildtype Moloney Murine Leukemia 
Virus (M-MLV), Avian Myeloblastosis Virus (AMV) or Rous Sarcoma Virus 
(RSV) reverse transcriptases. The RNase H activity of any enzyme may be 
determined by a variety of assays, such as those described, for example, in 
U.S. Patent No. 5,244,797, in Kotewicz, M.L., et al., Nucl. Acids Res. 16:265 
(1988) and in Gerard, G.F., et al., FOCUS 14(5):91 (1992), the disclosures of 
all of which are fully incorporated herein by reference. Particularly preferred 
polypeptides for use in the invention include, but are not limited to, M-MLV 
H" reverse transcriptase, RSV H" reverse transcriptase, AMV H" reverse 
transcriptase, RAV (rous-associated virus) H" reverse transcriptase, MAV 
(myeloblastosis-associated virus) FT reverse transcriptase and HIV FT reverse 
transcriptase. (See U.S. Patent No. 5,244,797 and WO 98/47912). It will be 
understood by one of ordinary skill, however, that any enzyme capable of 
producing a DNA molecule from a ribonucleic acid molecule (i.e., having 
reverse transcriptase activity) may be equivalently used in the compositions, 
methods and kits of the invention. 
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[0063] As used herein, a polypeptide is a sequence of contiguous amino acids, 

of any length. As used herein, the terms "peptide," "oligopeptide," or 
"protein" may be used interchangeably with the term "polypeptide. 

[0064] As vised herein, the term "amino acid sequence tag" is intended to 

mean any amino acid sequence that can be attached to, connected to, or linked 
to a heterologous amino acid sequence (e.g., an amino acid sequence of 
interest) and that can be used to identify, purify, concentrate or isolate said 
heterologous amino acid sequence. The attachment of the amino acid 
sequence tag to the heterologous amino acid sequence may occur, e.g., by 
constructing a nucleic acid molecule that comprises: (a) a nucleic acid 
sequence that encodes the amino acid sequence tag, and (b) a nucleic acid 
sequence that encodes a heterologous amino acid sequence. Exemplary amino 
acid sequence tags include, e.g., amino acid sequences that are capable of 
being post-translationally modified. Other Exemplary amino acid sequence 
tags include, e.g., amino acid sequences that are capable of being recognized 
and/or bound by an antibody (or fragment thereof) or other specific binding 
reagent. 

[0065] As used herein, the expression "amino acid sequence that is capable of 

being post-translationally modified" is intended to mean any amino acid 
sequence, or portion thereof, that can be recognized, in vivo or in vitro, by an 
enzyme or other molecule that is capable of covalently attaching a chemical 
entity to one or more amino acids within the amino acid sequence. 

[0066] As used herein, the term "post-translationally modified protein" is 

intended to mean at least one protein or polypeptide that has undergone or has 
been subjected to a post-translational modification. The term "post- 
translational modification" is intended to mean a modification that can take 
place in vivo (within a cell) or in vitro (outside a cell) whereby one or more 
chemical entities are covalently attached to at least one amino acid within the 
post-translational modification site by means of one or more enzymatic 
reactions. The site or sites include not only the amino acid that is modified, but 
any other amino acids, in the proper sequence, that are necessary to allow the 
post-translational modification to occur. , 
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[0067] In the context of the present invention, the amino acid sequences that 

are capable of being post-translationally modified include amino acid 
sequences that are capable of being modified by any type of post-translational 
modification that provides a marker for a protein or polypeptide. The post- 
translational modifications that are included within the present invention 
include those that can be used, directly or indirectly, to identify a protein or 
polypeptide or to isolate it from a mixture of other materials, including other 
proteins, such as those found in a cell extract or in medium in which a host 
cell has been cultured and which contains the protein or polypeptide. 

[0068] Amino acid sequences that are capable of being post-translationally 

modified include amino acid sequences that can subjected to multiple (e.g., 2, 
3, 4, or 5 or more) post-translational modifications. 

[0069] Preferred post-translational modifications are those that are utilized by 

a host cell to modify only a small number of proteins. Exemplary post- 
translational modifications that can be used with the present invention include 
biotinylation, attachment of 4-phosphopanthetheine, attachment of lipoic acid 
and attachment of flavins and glycosylatioiL Further details regarding post- 
translational modifications of amino acid sequences can be found in U.S. 
Patent No. 5,252,466 and the references cited therein. 

[0070] In a preferred embodiment of the invention, the amino acid sequence 

that is capable of being post-translationally modified is an amino acid 
sequence that is capable of being biotinylated (Parrott, M.B. and Barry, M.A., 
Biochem. Biophys. Res. Comm. 252:993-1000 (2001); Parrott, M.B. and Barry, 
M.A., Mol Ther. 7:96-104 (2000)). Amino acid sequences that are capable of 
being biotinylated are known in the art. Exemplary amino acid sequences that 
are capable of being biotinylated include, e.g., all or a portion of the Klebsiella 
pneumoniae oxalacetate decarboxylase a subunit, all or a portion of the 
Propionibacterium shermanii transcarboxylase 1.3S subunit, and all or a 
portion of the Escherichia coli biotin carboxyl carrier protein component of 
acetyl-CoA carboxylase. 
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[0071] According to certain embodiments of the invention, the amino acid 

sequence that is capable of being biotinylated is . an amino acid sequence 
derived from the C-terminus of the Klebsiella pneumoniae oxalacetate 
decarboxylase a subunit. In particular embodiments, the amino acid sequence 
that is capable of being biotinylated is a 72 amino acid peptide derived from 
the C-terminus of the Klebsiella pneumoniae oxalacetate decarboxylase a 
subunit (Schwarz, E. et aL, J. Biol Chem. 253:9640-9645 (1988)). This 72 
amino acid sequence is also known as "the BIOTAG™." Biotin is covalently 
attached to the oxalacetate decarboxylase a subunit and peptide sequencing 
has identified a single biotin binding site at lysine 561 of the protein. 
(Schwarz, E. et aL, J. Biol Chem. 263:9640-9645 (1988)). When fused to a 
heterologous protein, the BIOTAG™ enables the in vivo biotinylation of the 
recombinant protein of interest. It is preferred that the entire 72 amino acid 
domain be used to ensure recognition by the cellular biotinylation enzymes. 
Additional details regarding cellular biotinylation enzymes and the 
mechanisms of biotinylation can be found in Chapman-Smith, A. and Cronan, 
J., J. Nutr. 729:477S-484S (1999). 

[0072] Exemplary amino acid sequences that are capable of being biotinylated 

are listed in Table I. The nucleotide sequences encoding the exemplary amino 
acid sequence tags are listed in Table H 
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TABLE I : Exemplary Amino Acid Sequences 



That are Capable of Being Biotinylated 



Amino Acid Sequence Tag 


Amino Acid Sequence 


K. pneumoniae oxalacetate 
decarboxylase a subunit 
(Biotag™) 


GAGTPVTAPLAGTIWKVLASEGQTVAAGE 
VLLILEAMKMETEIRAAQAGTVRGIAVKAG 
DAVAVGDTLMTLA (SEQ ID NO:6) 


Mouse pyruvate 
decarboxylase domain 


KALAVSDLNRAGQRQVFFELNGQLRSILVK 
DTQAMKEMHFHPKALKDVKGQIGAPMPGK 
VTnTK'VA ArJDT<rVAKnOPT rVT 9AMTCMFTV 

VTSPMEGTIRKVHVTKDMTLEGDDLIL 
(SEQIDNO:7) 


P. shermanii transcarboxylase 
domain 


MKLKVTVNGTAYDVDVDVDKSHENPMGTI 
LFGGGTGGAPAPRAAGGAGAGKAGEGEIP 
APLAGTVSKILVKEGDTVKAGQTVLVLEA 
MKMETEINAPTDGKVEKVLVKERDAVOGG 
QGLIKIG (SEQ ID NO:8) 


Human acetyl CoA 
Carboxylase domain 


GSCVEVDVHRLSDGGLLLSYDGSSYTTYM 
KEEVDRYRITIGNKTCVFEKENDPSVMRSPS 
AGKLIQYTVEDGGHVFAGQCYAEIEVMKM 
Vls^LTAVESGCnrYVKRPGAALDPGCVLA 
KMQL (SEQ ID NO:9) 


E. coli acetyl CoA 
carboxylase BCCP subunit 


MDIRKKKLIELVEESGISELEISEGEESVRIS 

RAAPAASFPVMQQAYAAPMMQQPAQSNA 

AAPATVPSMEAPAAAEISGHTVRSPMVGTF 

YRTPSPDAKAFIEVGQKVNVGDTLCrVEAM 

KMMNQffiADKSGTVKAILVESGQPVEFDEP 

LVVIE (SEQ ID NO: 10) 
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TABLE II : Nucleotide Sequences of Exemplary Amino Acid Sequence Tags 



Amino Acid Sequence Tag 


Nucleotide Sequence Encoding the 
Amino Acid Sequence Tag 


K. pneumoniae oxalacetate 
decarboxylase a subunit 
(Biotag™) | 


ggcgccggcaccccggtgaccgccccgctggcgggcactatctgg 

aaggtgctggccagcgaaggccagacggtggccgcaggcgaggt 

gctgctgattctggaagccatgaagatggaaaccgaaatccgcgcc 

gcgcaggccgggaccgtgcgcggtatcgcggtgaaagccggcga 

cgcggtggcggtcggcgacaccctgatgaccctggcg (SEQ 

IDNO.ll) 


Mouse pyruvate 
decarboxylase domain 


aaagccctggctgtaagcgacctgaaccgtgctggccagaggcag 

gtgttctttgaactcaatgggcagcttcgatccattctggttaaagaca 

cccaggccatgaaggagatgcacttccatcccaaggctttgaaggat 

gtgaagggccaaattggggccccgatgcctgggaaggtcatagac 

atcaaggtggcagcaggggacaaggtggctaagggccagcccctc 

tgtgtgctcagcgccatgaagatggagactgtggtgacttcgcccat 

ggagggcactatccgaaaggttcatgttaccaaggacatgactctgg 

aaggcgacgacctcatccta (SEQ ID NO:12) 


P. shemtanii transcarboxylase 
domain 


atgaaactgaaggtaacagtcaacggcactgcgtatgacgttgacgt 

tgacgtcgacaagtcacacgaaaacccgatgggcaccatcctgttc 

ggcggcggcaccggcggcgcgccggcaccgcgcgcagcaggtg 

gcgcaggcgccggtaaggccggagagggcgagattcccgctccg 

ctggccggcaccgtctccaagatcctcgtgaaggagggtgacacg 

gtcaaggctggtcagaccgtgctcgttctcgaggccatgaagatgga 

gaccgagatcaacgctcccaccgacggcaaggtcgagaaggtcct 

tgtcaaggagcgtgacgccgtgcagggcggtcagggtctcatcaag 

atcggc(SEQIDNO:13) 


Human acetyl Co A 
Carboxylase domain 


ggctcatgtgtagaagtagatgtacatcggctgagtgacggtggact 

gctcttgtcctatgatggcagcagttacaccacgtatatgaaggagga 

agtagacagatatcgcatcacaattggcaataaaacctgtgtgtttga 

gaaggaaaatgacccatcggtgatgcgctcaccttctgctgggaagt 

taatccagtacattgtagaagatggaggtcatgtgtttgccggccagt 

gctatgcagagattgaggtaatgaagatggtaatgactttgacagctg 

tggagtctggctgtatccattacgtcaagcgtcctggagcagctcttg 

accctggctgtgtactcgccaaaatgcaactg (SEQ ID 

NO:14) 


E. coli acetyl CoA 
carboxylase BCCP subunit 


atggatattcgtaagattaaaaaactgatcgagctggttgaagaatca 

ggcatctccgaactggaaatttctgaaggcgaagagtcagtacgcat 

tagccgtgcagctcctgccgcaagtttccctgtgatgcaacaagctta 

cgctgcaccaatgatgcagcagccagctcaatctaacgcagccgct 

ccggcgaccgttccttccatggaagcgccagcagcagcggaaatc 

agtggtcacatcgtacgttccccgatggttggtactttctaccgcaccc 

caagcccggacgcaaaagcgttcatcgaagtgggtcagaaagtca 

acgtgggcgataccctgtgcatcgttgaagccatgaaaatgatgaac 

cagatcgaagcggacaaatccggtaccgtgaaagcaattctggtcg 

aaagtggacaaccggtagaatttgacgagccgctggtcgtcatcga 

g (SEQ ID NO: 15) 
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[0073] An amino acid sequence tag, as used herein, may alternatively or 

additionally be an amino acid sequence that is capable of being recognized by 
an antibody (or fragment thereof) or other specific binding reagent. The 
expression "amino acid sequence that is capable of being recognized by an 
antibody (or fragment thereof) or other specific binding reagent" is intended to 
mean any amino acid sequence, or portion thereof, to which a particular 
compound or reagent can interact with or bind to, either covalently or non- 
covalently. Such amino acid sequences are known in the art. Preferred amino 
acid sequences that are capable of being recognized by an antibody (or 
fragment thereof) or other specific binding reagent include, e.g., those that are 
known in the art as "epitope tags." An epitope tag may be a natural or an 
artificial epitope tag. Natural and artificial epitope tags are known in the art, 
including, e.g., artificial epitopes such as FLAG, Strep, or poly-histidine 
peptides. FLAG peptides include the sequence Asp-Tyr-Lys-Asp-Asp-Asp- 
Asp-Lys (SEQ ID NO:16) or Asp-Tyr-Lys-Asp-Glu-Asp-Asp-Lys (SEQ ID 
NO: 17) (Einhauer, A. and Jungbauer, A., J. Biochem. Biophys. Methods 49:1- 
3:455-465 (2001)). The Strep epitope has the sequence Ala-Trp-Arg-His-Pro- 
Gln-Phe-Gly-Gly (SEQ ID NO: 18). The VSV-G epitope can also be used and 
has the sequence Tyr-Thr-Asp-De-Glu-Met-Asn-Arg-Leu-Gly-Lys (SEQ ID 
NO: 19). Another artificial epitope is a poly-His sequence having six histidine 
residues (His-His-ffis-His-His-His (SEQ ID NO:20). Naturally-occurring 
epitopes include the influenza virus hemagglutinin (HA) sequence Tyr-Pro- 
Tyr-Asp-Val-Pro-Asp-Tyr-Ala-Ile-Glu-Gly-Arg (SEQ ID NO:21) recognized 
by the monoclonal antibody 12CA5 (Murray et aL, Anal. Biochem. 229:110- 
179 (1995)) and the eleven amino acid sequence from human c-myc (Myc) 
recognized by the monoclonal antibody 9E10 (Glu-Gln-Lys-Leu-Leu-Ser-Glu- 
Glu-Asp-Leu-Asn (SEQ ID NO:22) (Manstein et al, Gene 752:129-134 
(1995)). Another useful epitope is the tripeptide Glu-Glu-Phe (SEQ ID 
NO:23) which is recognized by the monoclonal antibody YL 1/2. (Stammers 
etal FEBSLett 253:298-302(1991)). 
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[0074] The nucleic acid molecules of the invention may include a variety of 

elements. The nucleic acid molecule of the invention preferably comprises 
one or more nucleic acid sequences which encode one or more amino acid 
sequence tags. The nucleic acid molecules may also comprise one or more 
recombination sites and/or one or more topoisomerase recognition sites and/or 
one or more topoisomerases. 

[0075] The nucleic acid molecules of the invention may also comprise one or 

more selectable markers, one or more cloning sites, one or more restriction 
sites, one or more promoters, one or more operators (e.g., a tet operator, a 
galactose operon operator, a lac operon operator, and the like), one or more 
operons, one or more origins of replication, one or more nucleotide sequences 
that encode a gene product which allows for negative selection, one or more 
nucleotide sequences which encode a repressor of at least one promoter, and 
one or more genes or gene products. Additional elements useful for molecular 
biology applications will be known to those skilled in the art and can be 
included within the nucleic acid molecules of the invention as well. The exact 
combination of elements, and their relative locations within the nucleic acid 
molecules of the invention, may vary depending on the intended uses of the 
nucleic acid molecules. 
[0076] As used herein, a selectable marker is intended to include a nucleic 

acid segment that allows one to select for or against a molecule (e.g., a 
replicon) or a cell that contains it, often under particular conditions. These 
markers can encode an activity, such as, but not limited to, production of 
RNA, peptide, or protein, or can provide a binding site for RNA, peptides, 
proteins, inorganic and organic compounds or compositions and the like. 
Examples of selectable markers include but are not limited to: (1) nucleic acid 
segments that encode products which provide resistance against otherwise 
toxic compounds (e.g., antibiotics); (2) nucleic acid segments that encode 
products which are otherwise lacking in the recipient cell (e.g., tRNA genes, 
auxotrophic markers); (3) nucleic acid segments that encode products which 
suppress the activity of a gene product; (4) nucleic acid segments that encode 
products which can be readily identified (e.g., phenotypic markers such as 
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(-galactosidase, green fluorescent protein (GFP), enhanced green fluorescent 
protein (EGFP), and cell surface proteins); (5) nucleic acid segments that bind 
products which are otherwise detrimental to cell survival and/or function; (6) 
nucleic acid segments that otherwise inhibit the activity of any of the nucleic 
acid segments described in Nos. 1-5 above (e.g., antisense oligonucleotides); 
(7) nucleic acid segments that bind products that modify a substrate (e.g. 
restriction endonucleases); (8) nucleic acid segments that can be used to 
isolate or identify a desired molecule (e.g. specific protein binding sites); (9) 
nucleic acid segments that encode a specific nucleotide sequence which can be 
otherwise non-functional (e.g., for PCR amplification of subpopulations of 
molecules); (10) nucleic acid segments, which when absent, directly or 
indirectly confer resistance or sensitivity to particular compounds; and/or (1 1) 
nucleic acid segments that encode products which are toxic in recipient cells. 
[0077] Exemplary selectable markers that can be included within the nucleic 

acid molecules of the invention include, e.g., a, gene encoding a product that 
confers resistance to chloramphenicol, e.g., a chloramphenicol resistance gene 
(CmR), a gene encoding a product thatconfers resistance to ampicillin, e.g., a 
gene which encodes (3-lactamase, a gene encoding a product that confers 
resistance to other antibiotic compounds, a ccdB gene or other toxic genes 
(allowing for counterselection of the nucleic acid molecule), and a gene 
encoding a product that confers resistance to blasticidin, e.g., a bsd resistance 
gene. Any other selectable marker gene known in the art can be include 
within the nucleic acid molecules of the invention. 
[0078] A "cloning site," as used herein includes any nucleic acid regions 

which contain at least one restriction endonuclease cleavage sites. The nucleic 
acid molecules of the invention may also comprise "multiple cloning sites." A 
multiple cloning site is any nucleic acid region which contains two or more 
restriction endonuclease cleavage sites. "Restriction endonuclease cleavage 
sites are also referred to in the art as "restriction sites." 
[0079] As used herein, a promoter is an example of a transcriptional 

regulatory sequence, and is specifically a nucleic acid sequence generally 
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described as the 5'-region of a gene located proximal to the start codon. The 
transcription of an adjacent nucleic acid segment is initiated at the promoter 
region. A repressible promoter's rate of transcription decreases in response to 
a repressing agent. An inducible promoter's rate of transcription increases in 
response to an inducing agent. A constitutive promoter's rate of transcription 
is not specifically regulated, though it can vary under the influence of general 
metabolic conditions. 
[0080] Any promoter known to those skilled in the art can be included in the 

nucleic acid molecules of the invention. Exemplary promoters include, e.g., 
the T7 promoter, the human cytomegalovirus (CMV) immediate early 
enhancer/promoter, the SV40 early promoter, a metallothionein (MT) 
promoter, including, e.g., the Drosophila MT promoter. Other exemplary 
promoters include those that are inducible by, or can be repressed by, e.g., 
certain carbon sources (e.g., glucose, galactose, arabinose, etc.), salts, 
temperature changes (e.g., temperatures greater than or less than the normal 
physiological growth temperature), and other molecules. 
[0081] A number of operators are known in the art and can be included in the 

nucleic acid molecules of the invention. An example of an operator suitable 
for use with the invention is the tryptophan operator of the tryptophan operon 
of E. coli. The tryptophan repressor, when bound to two molecules of 
tryptophan, binds to the E. coli tryptophan operator and, when suitably 
positioned with respect to the promoter, blocks transcription. Another 
example of an operator suitable for use with the invention is operator of the E. 
coli tetracycline operon. Components of the tetracycline resistance system of 
E. coli have also been found to function in eukaryotic cells and have been used 
to regulate gene expression. For example, the tetracycline repressor, which 
binds to tetracycline operator in the absence of tetracycline and represses gene 
transcription, has been expressed in plant cells at sufficiently high 
concentrations to repress transcription from a promoter containing tetracycline 
operator sequences (Gate et al, Plants 2:397-404 (1992)). The tetracycline 
regulated expression systems are described, for example in U.S. Patent No. 
5,789,i56, the entire disclosure of which is incorporated herein by reference. 
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Additional examples of operators which can be used with the invention 
include the Lac operator and the operator of the molybdate transport 
operator/promoter system of E. coli {see, e.g., Cronin et al., Genes Dev. 
75:1461-1467 (2001) and Grunden et al, J. Biol. Chem., 274:24308-24315 
(1999)). 

[0082] Thus, in particular embodiments, the invention provides nucleic acid 

molecules that contain one or more operators which can be used to regulate 
expression in prokaryotic or eukaryotic cells. As one skilled in the art would 
recognize, when a nucleic acid molecule which contains an operator is placed 
under conditions in which transcriptional machinery is present, either in vivo 
or in vitro, regulation of expression will often be modulated by contacting the 
nucleic acid molecule with a repressor and one or more metabolites which 
facilitate binding of an appropriate repressor to the operator. Thus, the 
invention further provides nucleic acid molecules which encode repressors 
which modulate the function of operators. 

[0083] The nucleic acid molecules of the invention may comprise one or more 

genes or partial genes. As used herein, <a gene is a nucleic acid sequence that 
contains information necessary for expression of a polypeptide, protein or 
functional RNA {e.g., a ribozyme, tRNA, rRNA, mRNA, etc.). It includes the 
promoter and the structural gene open reading frame sequence (orf) as well as 
other sequences involved in expression of the protein. As used herein, a 
structural gene refers to a nucleic acid sequence that is transcribed into 
messenger RNA that is then translated into a sequence of amino acids 
characteristic of a specific polypeptide. 

[0084] The range of positions of the various elements of the nucleic acid 

molecules of the invention, relative to one another, will be appreciated by 
persons having ordinary skill in the art. For example, a nucleic acid molecule 
within the scope of the invention may comprise (a) one or more recombination 
sites; and (b) one or more nucleic acid sequences which encode one or more 
amino acid sequence tags. In a preferred embodiment, elements (a) and (b) 
will be positioned relative to one another such that a nucleic acid sequence of 
interest can be inserted at or within 20 nucleotides of said one or more 
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recombination sites, thereby producing a polynucleotide construct that 
encodes a fusion protein. Such fusion protein may comprise: (i) the amino 
acid sequence tag, and (ii) the amino acid sequence encoded by said nucleic 
acid sequence of interest 
[0085] Similarly, a nucleic acid molecule within the scope of the invention 

may comprise (a) one or more topoisomerase recognition sites and/or one or 
more topoisomerases; and (b) one or more nucleic acid sequences which 
encode one or more amino acid sequence tags. In a preferred embodiment, 
elements (a) and (b) will be positioned relative to one another such that a 
nucleic acid sequence of interest can be inserted at or within 20 nucleotides of 
said one or more topoisomerase recognition sites and/or at or within 20 
nucleotides of the position of said one or more topoisomerases, thereby 
producing a polynucleotide construct that encodes a fusion protein. Such 
fusion protein may comprise: (i) the amino acid sequence tag, and (ii) the 
amino acid sequence encoded by said nucleic acid sequence of interest. 
[0086] Similarly, a nucleic acid molecule within the scope of the invention 

may comprise (a) one or more recombination sites; (b) one or more 
topoisomerase recognition sites and/or one or more topoisomerases; and (c) 
one or more nucleic acid sequences which encode one or more amino acid 
sequence tags. In a preferred embodiment, elements (a), (b) and (c) will be 
positioned relative to one another such that a nucleic acid sequence of interest 
can be inserted at or within 20 nucleotides of said one or more recombination 
sites, thereby producing a polynucleotide construct that encodes a fusion 
protein. Such fusion protein may comprise: (i) the amino acid sequence tag, 
and (ii) the amino acid sequence encoded by said nucleic acid sequence of 
interest. In another preferred embodiment, elements (a), (b) and (c) will be 
positioned relative to one another such that a nucleic acid sequence of interest 
can be inserted at or within 20 nucleotides of said, one or more topoisomerase 
recognition sites and/or at or within 20 nucleotides of the position of said one 
or more topoisomerases, thereby producing a polynucleotide construct that 
encodes a fusion protein. Such fusion protein may comprise: (i) the amino 
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acid sequence tag, and (ii) the amino acid sequence encoded by said nucleic 
acid sequence of interest. 
[0087] In certain embodiments, the nucleic acid molecules of the invention 

will comprise a nucleic acid sequence that encodes an amino acid sequence 
that is capable of being recognized and/or cleaved by one or more proteases. 
Amino acid sequences that can be recognized and/or cleaved by one or more 
proteases are known in the art. Exemplary amino acid sequences are those 
that are recognized by the following proteases: factor Vila, factor Dta, factor 
Xa, APC, t-PA, u-PA, trypsin, chymotrypsin, enterokinase, pepsin, cathepsin 
B,H,L,S,D, cathepsin G, renin, angiotensin converting enzyme, matrix 
metalloproteases (collagenases, stromelysins, gelatinases), macrophage 
elastase, Cir, and Cis. The amino acid sequences that are recognized by the 
aforementioned proteases are known in the art. Exemplary sequences 
recognized by certain proteases can be found, e.g., in U.S. Patent No. 
5,811,252. A preferred amino acid sequence that is capable of being 
recognized and/or cleaved by a protease is the enterokinase (EK) recognition 
site (Asp-Asp-Asp-Asp-Lys (SEQ ID NO:24). 
[0088] The- invention therefore also includes nucleic acid molecules 

comprising: (a) one or more recombination sites; (b) one or more nucleic acid 
sequences which encode one or more amino acid sequence tags; and (c) one or 
more nucleic acid sequences that encodes an amino acid sequence that is 
capable of being recognized and/or cleaved by one or more proteases. 
[0089] The invention also includes nucleic acid molecules comprising: (a) one 

or more topoisomerase recognition sites and/or one or more topoisomerases; 
(b) one or more nucleic acid sequences which encode one or more amino acid 
sequence tags; and (c) one or more nucleic acid sequence that encodes an 
amino acid sequence that is capable of being recognized and/or cleaved by one 
or more proteases. In a preferred aspect, the nucleic acid sequence that 
encodes an amino acid sequence that is capable of being recognized and/or 
cleaved by one or more proteases is positioned such that, upon cleavage, the 
amino acid sequence tag is completely or partially removed from the amino 
acid sequence of interest. In another aspect, the nucleic acid sequence that 
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encodes an amino acid sequence that is capable of being recognized and/or 
cleaved by one or more proteases is positioned such that, upon cleavage, other 
sequences (e.g., topoisomerase recognition sequences and/or recombination 
sites) may be removed from the amino acid sequence of interest. 
[00901 The invention also includes nucleic acid molecules comprising: (a) one 

or more recombination sites; (b) one or more topoisomerase recognition sites 
and/or one or more topoisomerases; (c) one or more nucleic acid sequences 
which encode one or more amino acid sequence tags; and (d) one or more 
nucleic acid sequence that encodes an amino acid sequence that is capable of 
being recognized and/or cleaved by one or more proteases. In a preferred 
aspect, the nucleic acid sequence that encodes an amino acid sequence that is 
capable of being recognized and/or cleaved by one or more proteases is 
positioned such that, upon cleavage, the amino acid sequence tag is 
completely or partially removed from the amino acid sequence of interest. In 
another aspect, the nucleic acid sequence that encodes an amino acid sequence 
that is capable of being recognized and/or cleaved by one or more proteases is 
positioned such that, upon cleavage, other sequences (e.g., topoisomerase 
recognition sequences and/or recombination sites) may be removed from the 
amino acid sequence of interest. 
[0091] The position of a nucleic acid sequence that encodes an amino acid 

sequence that is capable of being recognized and/or cleaved by one or more 
proteases, relative to the other elements of the nucleic acid molecules of the 
invention will be such that a nucleic acid sequence of interest can be inserted 
at or within 20 nucleotides of said one or more recombination sites, or at or 
within 20 nucleotides of said one or more topoisomerase recognition sites 
and/or at or within 20 nucleotides of the position of said one or more 
topoisomerases, thereby producing a polynucleotide construct that encodes a 
fusion protein. Such fusion protein may comprise: (i) said amino acid 
sequence that is capable of being cleaved by one or more proteases, flanked on 
one side by (ii) said amino acid sequence tag, and on the other side by (iii) the 
amino acid sequence encoded by said nucleic acid sequence of interest. 
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[0092] This arrangement of elements will enable the production of a fusion 

protein of interest comprising an amino acid sequence tag, and will also enable 
the subsequent cleavage of the fusion protein by a protease, thereby separating 
the amino acid sequence tag from the amino acid sequence encoded by said 
nucleic acid sequence of interest. If the fusion protein is a fusion protein that 
is capable of being post-translationally modified, cleavage by the protease can 
be accomplished either before or after the post-translational modification of 
the fusion protein. 

[0093] In addition to comprising one or more nucleic acid sequences which 

encode one or more amino acid sequence tags and/or one or more 
recombination sites and/or one or more topoisomerase recognition sites and/or 
one or more topoisomerases and/or one or more nucleic acid sequence that 
encodes an amino acid sequence that is capable of being cleaved by one or 
more proteases, the nucleic acid molecules of the invention may further 
comprise additional elements. Exemplary additional elements that can be 
included within the nucleic acid molecules of the invention include, e.g., one 
or more promoters, one or more selectable markers, one or more origins of 
replication, one or more operators, one or more enhancers, one or more 
ribosome binding sites, one or more initiation codons, one or more nucleic 
acid sequences of interest (e.g., one or more nucleic acid sequences encoding 
one or more protein or polypeptides of interest), one or more polyadenylation 
signals, and/or one or more transcription termination regions. As understood 
by those skilled in the art, other elements may be included within the nucleic 
acid molecules of the invention depending on the circumstances under which 
the nucleic acids are intended to be used. 
[0094] The possible arrangements of the various elements of the nucleic acid 

molecules of the invention, relative to one another, will be appreciated by 
persons having ordinary skill in the art. Non-limiting, exemplary 
arrangements are as follows: 
[0095] Exemplary arrangement I: (a) one or more promoters - (b) one or more 

nucleic acid sequences which encode one or more amino acid sequence tags - 
(c) one or more nucleic acid sequences that encodes an amino acid sequence 
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that is capable of being cleaved by one or more proteases - (d) one or more 
recombination sites and/or one or more topoisomerase recognition sites and/or 
one or more topoisomerases - (e) one or more polyadenylation signals and/or 
one or more transcription termination regions. 
[0096] Exemplary arrangement II: (a) one or more promoters - (b) one or 

more nucleic acid sequences which encode one or more amino acid sequence 
tags - (c) one or more nucleic acid sequences that encodes an amino acid 
sequence that is capable of being cleaved by one or more proteases - (d) one 
or more recombination sites and/or one or more topoisomerase recognition 
sites and/or one or more topoisomerases - (e) one or more nucleic acid 
sequences of interest - (f) one or more polyadenylation signals and/or one or 
more transcription termination regions. 
[0097] Exemplary arrangement HI: (a) one or more promoters - (b) one or 

more nucleic acid sequences which encode one or more amino acid sequence 
tags - (c) one or more recombination sites and/or one or more topoisomerase 
recognition sites and/or one or more topoisomerases - (d) one or more 
polyadenylation signals and/or one or more transcription termination regions. 
[00981 Exemplary arrangement IV: (a) one or more promoters - (b) one or 

more nucleic acid sequences which encode one or more amino acid sequence 
tags - (c) one or more recombination sites and/or one or more topoisomerase 
recognition sites and/or one or more topoisomerases - (d) one or more nucleic 
acid sequences of interest - (e) one or more polyadenylation signals and/or 
one or more transcription termination regions. 
[00991 Exemplary arrangement V: (a) one or more promoters - (b) one or 

more recombination sites and/or one or more topoisomerase recognition sites 
and/or one or more topoisomerases - (c) one or more nucleic acid sequences 
that encodes an amino acid sequence that is capable of being cleaved by one or 
more proteases - (d) one or more nucleic acid sequences which encode one or 
more amino acid sequence tags - (e) one or more polyadenylation signals 
and/or one or more transcription termination regions. 
[00100] Exemplary arrangement VI: (a) one or more promoters - (b) one or 
- more nucleic acid sequences of interest - (c) one or more recombination sites 
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and/or one or more topoisomerase recognition sites and/or one or more 
topoisomerases - (d) one or more nucleic acid sequences that encodes an 
amino acid sequence that is capable of being cleaved by one or more proteases 
- (e) one or more nucleic acid sequences which encode one or more amino 
acid sequence tags - (f) one or more polyadenylation signals and/or one or 
more transcription termination regions. 

[00101] Exemplary arrangement VII: (a) one or more promoter - (b) one or 
more recombination sites and/or one or more topoisomerase recognition sites 
and/or one or more topoisomerases - (c) one or more nucleic acid sequences 
which encode one or more amino acid sequence tags - (d) one or more 
polyadenylation signals and/or one or more transcription termination regions. 

[00102] Exemplary arrangement VIII: (a) one or more promoters - (b) one or 
more nucleic acid sequences of interest - (c) one or more recombination sites 
and/or one or more topoisomerase recognition sites and/or one or more 
topoisomerases - (d) one or more nucleic acid sequences which encode one or 
more amino acid sequence tags - (e) one or more polyadenylation signals 
and/or one or more transcription termination regions. 

[00103] In the foregoing exemplary arrangements, it will be understood by 
those skilled in the art that one or more additional elements may be included 
between any of the specifically listed elements, and/or that any of the 
specifically listed elements may be omitted. It will also be understood that 
many variations on these exemplary arrangements are possible (e.g., addition 
and/or omission of various elements) such that the nucleic acid molecules of 
the invention will allow the insertion of a nucleic acid sequence of interest 
and/or the production of a polynucleotide construct that encodes a desired 
fusion protein. 

[00104] Persons of ordinary skill in the art will readily understand how close 
together, or how far apart, the elements of the nucleic acid molecules of the 
invention can be in order to permit the insertion of a nucleic acid sequence of 
interest and/or the production of a polynucleotide construct that encodes a 
desired fusion protein. For example, any two or more of the foregoing 
elements may be arranged within the nucleic acid molecules of the invention 
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such that they are within about 500 nucleotides of one another. In certain 
embodiments, any two or more elements of the nucleic acid molecules will be 
within about 400 nucleotides of one another, within about 300 nucleotides of 
one another, within about 200 nucleotides of one another, within about 100 
nucleotides of one another, within about 50 nucleotides of one another, within 
about 40 nucleotides of one another, within about 30 nucleotides of one 
another, within about 20 nucleotides of one another, within about 10 
nucleotides of one another, within about 5 nucleotides of one another, within 
about 4 nucleotides of one another, within about 3 nucleotides of one another, 
within about 2 nucleotides of one another, or within about 1 nucleotide of one 
another. The elements of the nucleic acid molecules of the invention may 
alternatively be directly adjacent to one another (e.g., with no nucleotides 
separating them), as long as such an arrangement permits the insertion of a 
nucleic acid sequence of interest and/or the production of a polynucleotide 
construct that encodes a desired fusion protein. 
[00105] It will also be appreciated that the nucleic acid sequence of interest will 
be preferably designed such that, when it is inserted at or within 20 
nucleotides of said one or more recombination sites or at or within 20 
nucleotides of said one or more topoisomerase recognition sites and/or at or 
within 20 nucleotides of the position of said one or more topoisomerases, the 
nucleic acid sequence of interest is in frame with the nucleic acid sequence 
tag. 

[00106] The nucleic acid molecules of the invention are useful, e.g., in the 
production of fusion proteins that comprise one or more amino acid sequence 
tags. The fusion protein may be, e.g., an N-terminal fusion protein (e.g., 
wherein an amino acid sequence tag is covalently attached at or near the N- 
tenninus of the amino acid sequence encoded by said nucleic acid sequence of 
interest). The fusion protein may also be, e.g., a C-terminal fusion protein 
(e.g., wherein an amino acid sequence tag is covalently attached at or near the 
C-terminus of the amino acid sequence encoded by said nucleic acid sequence 
of interest). The fusion protein may also be, e.g:, an N-terminal and C-terminal 
fusion protein (e.g., wherein an amino acid sequence tag is covalently attached 
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at or near the N-terminus of the amino acid sequence encoded by said nucleic 
acid sequence of interest and an amino acid sequence tag is covalently 
attached at or near the C-terminus of the amino acid sequence encoded by said 
nucleic acid sequence of interest). 

[00107] The nucleic acid molecules of the invention may comprise one or more 
(e.g., 2, 3, 4, 5, 6, 7, 8, etc.) recombination sites. As used herein, a 
recombination site is a recognition sequence on a nucleic acid molecule 
participating in an integration/recombination reaction by recombination 
proteins. Recombination sites are discrete sections or segments of nucleic acid 
on the participating nucleic acid molecules that are recognized and bound by a 
site-specific recombination protein during the initial stages of integration or 
recombination. For example, the recombination site for Cre recombinase is 
loxP which is a 34 base pair sequence comprised of two 13 base pair inverted 
repeats (serving as the recombinase binding sites) flanking an 8 base pair core 
sequence. See Fig. 1 of Sauer, B., Curr. Opin. Biotech. 5:521-527 (1994). 
Other examples of recognition sequences include the attB, attP, attL, and attR 
sequences described herein, and mutatjts, fragments, variants and derivatives 
thereof, which are recognized by the recombination protein (Int and by the 
auxiliary proteins integration host factor (MF), FIS and excisionase (Xis). 
See Landy, Curr. Opin. Biotech. 3:699-707 (1993). 

[00108] Recombination sites for use in the invention may be any nucleic acid 
sequence that can serve as a substrate in a recombination reaction. Such 
recombination sites may be wild-type or naturally occurring recombination 
sites or modified or mutant recombination sites. Examples of recombination 
sites for use in the invention include, but are not limited to, phage-lambda 
recombination sites (such as attP, attB, attL, and attR and mutants or 
derivatives thereof) and recombination sites from other bacteriophage such as 
phi80, P22, P2, 186, P4 and PI (including lox sites such as loxP and loxP51 1). 
Novel mutated att sites (e. g., attB 1-10, attP 1-10, attR 1-10 and attL 1-10) are 
described in International Patent Application PCT/US00/05432, which is 
specifically incorporated herein by reference. Other recombination sites 
having unique specificity (i.e., a first site will recombine with its 
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corresponding site and will not recombine with a second site having a different 
specificity) are known to those skilled in the art and may be used to practice 
the present invention. 

[00109] Corresponding recombination proteins for these systems may be used 
in accordance with the invention with the indicated recombination sites. Other 
systems providing recombination sites and recombination proteins for use in 
the invention include the FLP/FRT system from Saccharomyces cerevisiae, 
the resolvase family (e.g., (, Tn3 resolvase, Hin, Gin and Cin), and IS231 and 
other Bacillus thuringiensis transposable elements. Other suitable 
recombination systems for use in the present invention include the XerC and 
XerD recombinases and the psi, dif and cer recombination sites in E. coli. 
Other suitable recombination sites may be found in United States patent nos. 
5,851,808 and 6,410,317 which are specifically incorporated herein by 
reference. Preferred recombination proteins and mutant or modified 
recombination sites for use in the invention include those described in U.S. 
Patent Nos. 5,888,732, 6,171,861, 6,143,557, 6,270,969 and 6,277,608, and 
commonly owned, co-pending U.S.. Application Nos. 09/438,358 (filed 
11/12/99); 09/517,466 (filed 03/02/00), 09/695,065 (filed 10/25/00), 
09/732,914 (filed 12/11/00), and international application Nos. WO 01/11058 
and WO 01/42509, the disclosures of all of which are incorporated herein by 
reference in their entireties, as well as those associated with the Gateway™ 
Cloning Technology and Echo™ Cloning Technology available from 
Invitrogen Corporation (Carlsbad, CA). 
[00110] The nucleic acid molecules of the invention may comprise one or more 
(e.g., 2, 3, 4, 5, 6, 7, 8, etc.) topoisomerase recognition sites and/or one or 
more topoisomerases. As used herein, a topoisomerase recognition sequence 
(alternatively and equivalent^ referred to herein as a 'Hopoisomerase 
recognition site") is a particular sequence to which a topoisomerase recognizes 
and binds. Examples of topoisomerase recognition sites include, but are not 
limited to, the sequence S'-GCAACTT-S' that is recognized by E. coli 
topoisomerase m (a type I topoisomerase); the sequence 5'-(Cn)CCTT-y 
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which is a topoisomerase recognition site that is bound specifically by most 
poxvirus topoisomerases, including vaccinia virus DNA topoisomerase I; and 
others that are known in the art as discussed elsewhere herein. 
[00111] Topoisomerases are categorized as type I, including type IA and type 
IB topoisomerases, which cleave a single strand of a double stranded nucleic 
acid molecule, and type II topoisomerases (gyrases), which cleave both strands 
of a nucleic acid molecule. Type IA and IB topoisomerases cleave one strand 
of a nucleic acid molecule. Cleavage of a nucleic acid molecule by type IA 
topoisomerases generates a 5' phosphate and a 3' hydroxyl at the cleavage site, 
with the type IA topoisomerase covalently binding to the 5' terminus of a 
cleaved strand. In comparison, cleavage of a nucleic acid molecule by type IB 
topoisomerases generates a 3' phosphate and a 5' hydroxyl at the cleavage site, 
with the type IB topoisomerase covalently binding to the 3' terminus of a 
cleaved strand. As disclosed herein, type I and type II topoisomerases, as well 
as catalytic domains and mutant forms thereof, are useful for generating 
ds recombinant nucleic acid molecules covalently linked in both strands 
according to a method of the invention. . 
[00112] Type IA topoisomerases include E. coli topoisomerase I, E. coli 
topoisomerase HI, eukaryotic topoisomerase II, archeal reverse gyrase, yeast 
topoisomerase HI, Drosophila topoisomerase HI, human topoisomerase HI, 
Streptococcus pneumoniae topoisomerase JSL, and the like, including other 
type IA topoisomerases (see Berger, Biochim. Biophys. Acta 7400:3-18, 1998; 
DiGate and Marians, J. Biol. Chern. 2(54:17924-17930, 1989; Kim and Wang, 
J. Biol. Chem. 267:17178-17185, 1992; Wilson et al., J.Biol. Chem. 
275:1533-1540, 2000; Hanai et al., Proc. Natl. Acad. ScL, USA P3:3653-3657, 
1996, U.S. Pat. No. 6,277,620, each of which is incorporated herein by 
reference). E. coli topoisomerase HI, which is a type IA topoisomerase that 
recognizes, binds to and cleaves the sequence S'-GCAACTT-S', can be 
particularly useful in a method of the invention (Zhang et al., J. Biol. Chem. 
270:23700-23705, 1995, which is incorporated herein by reference). A 
homolog, the traE protein of plasmid RP4, has been described by Li et al., J. 
Biol. Chem. 272:19582-19587 (1997) and can also be used in the practice of 
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the invention. A DNA-protein adduct is formed with the enzyme covalently 
binding to the 5'-thymidine residue, with cleavage occurring between the two 
thymidine residues. 

[00113] Type IB topoisomerases include the nuclear type I topoisomerases 
present in all eukaryotic cells and those encoded by vaccinia and other cellular 
poxviruses (see Cheng et al., Cell 92:841-850, 1998, which is incorporated 
herein by reference). The eukaryotic type TB topoisomerases are exemplified 
by those expressed in yeast, Drosophila and mammalian cells, including 
human cells (see Caron and Wang, Adv. Pharmacol 29B;21\-1W, 1994; 
Gupta et al., Biochim. Biophys. Acta 1262:1-14, 1995, each of which is 
incorporated herein by reference; see, also, Berger, supra, 1998). Viral type IB 
topoisomerases are exemplified by those produced by the vertebrate 
poxviruses (vaccinia, Shope fibroma virus, ORF virus, fowlpox virus, and 
molluscum contagiosum virus), and the insect poxvirus (Amsacta moorei 
entomopoxvirus) (see Shuman, Biochim. Biophys. Acta 7400:321-337, 1998; 
Petersen et al., Virology 230:197-206, 1997; Shuman and Prescott, Proc. Natl. 
Acad. Sci., USA 54:7478-7482, 1987; Shuman, J. Biol. Chem. 269:32678- 
32684, 1994; U.S. Pat. No. 5,766,891; PCT/US95/16099; PCTYUS98/12372,, 
each of which is incorporated herein by reference; see, also, Cheng et al., 
supra, 1998). 

[00114] Type H topoisomerases include, for example, bacterial gyrase, bacterial 
DNA topoisomerase IV, eukaryotic DNA topoisomerase II, and T-even phage 
encoded DNA topoisomerases (Roca and Wang, Cell 77:833-840, 1992; Wang, 
J. Biol. Chem. 266:6659-6662, 1991, each of which is incorporated herein by 
reference; Berger, supra, 1998). Like the type IB topoisomerases, the type B 
topoisomerases have both cleaving and ligating activities, ^addition, like 
type IB topoisomerase, substrate nucleic acid molecules can be prepared such 
that the type B topoisomerase can form a covalent linkage to one strand at a 
cleavage site. For example, calf thymus type E topoisomerase can cleave a 
substrate nucleic acid molecule containing a 5' recessed topoisomerase 
recognition site positioned three nucleotides from the 5' end, resulting in 
dissociation of the three nucleotide sequence 5' to the cleavage site and 
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covalent binding the of the topoisomerase to the 5 f terminus of the nucleic acid 
molecule (Andersen et al., supra, 1991). Furthermore, upon contacting such a 
type II topoisomerase charged nucleic acid molecule with a second nucleotide 
sequence containing a 3 f hydroxyl group, the type II topoisomerase can ligate 
the sequences together, and then is released from the recombinant nucleic acid 
molecule. As such, type II topoisomerases also are useful in the nucleic acid 
molecules and methods of the invention. 
[00115] Structural analysis of topoisomerases indicates that the members of 
each particular topoisomerase families, including type IA, type IB and type II 
topoisomerases, share common structural features with other members of the 
family (Berger, supra, 1998). In addition, sequence analysis of various 
type IB topoisomerases indicates that the structures are highly conserved, 
particularly in the catalytic domain (Shuman, supra, 1998; Cheng et aL, supra, 
1998; Petersen et al., supra, 1997). For example, a domain comprising amino 
acids 81 to 314 of the 314 amino acid vaccinia topoisomerase shares 
substantial homology with other type IB topoisomerases, and the isolated 
domain has essentially the same actiyity as the full length topoisomerase, 
although the isolated domain has a slower turnover rate and lower binding 
affinity to the recognition site (see Shuman, supra, 1998; Cheng et al., supra, 
1998). In addition, a mutant vaccinia topoisomerase, which is mutated in the 
amino terminal domain (at amino acid residues 70 and 72) displays identical 
properties as the full length topoisomerase (Cheng et al., supra, 1998). In fact, 
mutation analysis of vaccinia type IB topoisomerase reveals a large number of 
amino acid residues that can be mutated without affecting the activity of the 
topoisomerase, and has identified several amino acids that are required for 
activity (Shuman, supra, 1998). In view of the high homology shared among 
the vaccinia topoisomerase catalytic domain and the other type IB 
topoisomerases, and the detailed mutation analysis of vaccinia topoisomerase, 
it will be recognized that isolated catalytic domains of the type IB 
topoisomerases and . type IB topoisomerases having various amino acid 
mutations can be included with the nucleic acid molecules and methods of the 
invention. 



WO 2004/005482 



PCT/US2003/0213J9 



-38- 



[00116] The various topoisomerases exhibit a range of sequence specificity. 
For example, type H topoisomerases can bind to a variety of sequences, but 
cleave at a highly specific recognition site (see Andersen et al, J. Biol. Chem L 
266:9203-9210, 1991, which is incorporated herein by reference.). In 
comparison, the type IB topoisomerases include site specific topoisomerases, 
which bind to and cleave a specific nucleotide sequence ("topoisomerase 
recognition site"). Upon cleavage of a nucleic acid molecule by a 
topoisomerase, for example, a type IB topoisomerase, the energy of the 
phosphodiester bond is conserved via the formation of a phosphotyrosyl 
linkage between a specific tyrosine residue in the topoisomerase and the 
3' nucleotide of the topoisomerase recognition site. Where the topoisomerase 
cleavage site is near the 3' terminus of the nucleic acid molecule, the 
downstream sequence (3' to the cleavage site) can dissociate, leaving a nucleic 
acid molecule having the topoisomerase covalently bound to the newly 
generated 3' end. 

[00117] The nucleic acid molecules of the invention are useful, e.g., for the 
production of fusion proteins. As used herein, the term "fusion protein" is 
intended to include any polypeptide which contains amino acids derived from 
at least two different polypeptides. The nucleic acid molecules of the 
invention are especially useful, e.g., for producing fusion proteins comprising 
(i) one or more amino acid sequence tags, and (ii) one or more amino acid 
sequence encoded by one or more nucleic acid sequences of interest 
[00118] The invention also includes vectors comprising any of the nucleic acid 
molecules described herein. As used herein, a vector is a nucleic acid 
molecule (preferably DNA) that provides a useful biological or biochemical 
property to an insert. Examples include plasmids, phages, autonomously 
replicating sequences (ARS), centromeres, and other sequences which are able 
to replicate or be replicated in vitro or in a host cell, or to convey a desired 
nucleic acid segment to a desired location within a host cell. A Vector can 
have one or more restriction endonuclease recognition sites at which the 
sequences can be cut in a determinable fashion without loss of an essential 
biological function of the vector, and into which a nucleic acid fragment can 
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be spliced in order to bring about its replication and cloning. Vectors can 
further provide primer sites, e.g., for PCR, transcriptional and/or translational 
initiation and/or regulation sites, recombinational signals, replicons, selectable 
markers, etc. Clearly, methods of inserting a desired nucleic acid fragment 
which do not require the use of recombination, transpositions or restriction 
enzymes (such as, but not limited to, UDG cloning of PCR fragments (U.S. 
Patent No. 5,334,575, entirely incorporated herein by reference), TA Cloning® 
brand PCR cloning (Invitrogen Corporation, Carlsbad, CA) (also known as 
direct ligation cloning), and the like) can also be applied to clone a fragment 
into a cloning vector to be used according to the present invention. The 
cloning vector can further contain one or more selectable markers suitable for 
use in the identification of cells transformed with the cloning vector. 
[00119] Exemplary vectors that are encompassed by the present invention 
include, e.g., pET104-DEST (SEQ ID NO:l) (Fig. 1), pET104/GW//acZ (Fig. 
2), pET104/D-TOPO (SEQ ID NO:2) (Fig. 3), P ET104/D//acZ (Fig. 4), 
pcDNA6/Biotag™-DEST (SEQ ID NO:3) (Fig. 5), pcDNA6/Biotag™- 
GSNIlacZ (Fig. 6), pcDNA6/Biotag™/D-TOPO (SEQ ID NO:4) (Fig. 7), 
pcDNA6/Biotag™//acZ (Fig. 8), pMT7Biotag™-DEST (SEQ ID NO:5) (Fig. 
9), andpMT/Biotag™/GW-/acZ(Fig. 10). 
[00120] The invention also encompasses nucleic acid molecules having nucleic 
acid sequences that are at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 
96%, 97%, 98% or 99% identical to at least 25, 50, 100, 200, 300, 400, 500, 
600, 700, 800, 900, 1000, 2000, 3000 or 4000 contiguous nucleotides of the 
exemplary vectors pET104-DEST (SEQ ID NO:l), pET104/D-TOPO (SEQ 
ID NO:2), pcDNA6/Biotag™-DEST (SEQ ID NO:3), pcDNA6/Biotag™/D- 
TOPO (SEQ ED NO:4) and pMT/Biotag™-DEST (SEQ ID NO:5). The 
invention also encompasses nucleic acid molecules comprising one or more 
nucleic acid sequences which encode an amino acid sequence tag, wherein 
said one or more nucleic acid sequences are at least 80%, 85%, 90%, 91%, 
92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical to at least 25, 50, 75, 
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100, 125, 150, 175 or 200 contiguous nucleotides of any one of SEQ ID 
Nos:ll-15. 

[00121] By a nucleic acid molecule having a nucleotide sequence at least, for 
example, 80% "identical" to a reference nucleotide sequence it is intended that 
the nucleotide sequence of the nucleic acid molecule is identical to the 
reference sequence except that the nucleotide sequence may include up to 20 
nucleotide alterations per each 100 nucleotides of the nucleotide sequence of 
the reference nucleic acid molecule. In other words, to obtain a nucleic acid 
molecule having a nucleotide sequence at least 80% identical to a reference 
nucleotide sequence, up to 20% of the nucleotides in the reference sequence 
may be deleted or substituted with another nucleotide, or a number of 
nucleotides, up to 20% of the total nucleotides in the reference sequence, may 
be inserted into the reference sequence. These alterations of the reference 
sequence may occur, e.g., at the 5' or 3' ends of the reference nucleotide 
sequence and/or anywhere between those terminal positions, interspersed 
either individually among nucleotides in the reference sequence and/or in one 
or more contiguous groups within the reference sequence. 
(001221 As a practical matter, whether any particular nucleic acid molecule is 
at least 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% 
identical to, for instance, a specified number of contiguous nucleotides of the 
nucleotide sequences shown in SEQ ID NOs:l-5 and 11-15 can be determined 
conventionally using known computer programs such as the Bestfit program 
(Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics 
Computer Group, University Research Park, 575 Science Drive, Madison, WI 
53711). Bestfit uses the local homology algorithm of Smith and Waterman, 
Advances in Applied Mathematics 2: 482-489 (1981), to find the best segment 
of homology between two sequences. When using Bestfit or any other 
sequence alignment program to determine whether a particular sequence is, for 
instance, 95% identical to a reference sequence according to the present 
invention, the parameters are set, of course, such that the percentage of 
identity is calculated over the full length of the reference nucleotide sequence 
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and that gaps in homology of up to 5% of the total number of nucleotides in 
the reference sequence are allowed. 
[00123] A preferred method for determining the best overall match between a 
query sequence (a sequence of the present invention) and a subject sequence, 
also referred to as a global sequence alignment, can be determined using the 
FASTDB computer program based on the algorithm of Brutlag et a/., Comp. 
Appl Bioscl 6:237-245 (1990). In a sequence alignment, the query and 
subject sequences are both DNA sequences. An RNA sequence can be 
compared by converting LPs to T's. The result of said global sequence 
alignment is in percent identity. Preferred parameters used in a FASTDB 
alignment of DNA sequences to calculate percent identity are: 
Matrix=Unitary, k-tuple=4, Mismatch Penalty=l, Joining Penalty=30, 
Randomization Group Length=0, Cutoff Score=l, Gap Penalty=5, Gap Size 
Penalty=0.05, Window Size=500 or the length of the subject nucleotide 
sequence, whichever is shorter. 
[00124] If the subject sequence is shorter than the query sequence because of 5 f 
or 3 f deletions, not because of internal deletions, a manual correction must be 
made to the results. This is because the FASTDB program does not account 
for 5* and 3 1 truncations of the subject sequence when calculating percent 
identity. For subject sequences truncated at the 5' or 3' ends, relative to the 
query sequence, the percent identity is corrected by calculating the number of 
bases of the query sequence that are 5' and 3' of the subject sequence, which 
are not matched/aligned, as a percent of the total bases of the query sequence. 
Whether a nucleotide is matched/aligned is determined by the results of the 
FASTDB sequence alignment. This percentage is then subtracted from the 
percent identity, calculated by the above FASTDB program using the 
specified parameters, to arrive at a final percent identity score. This corrected 
score is what is used for the purposes of the present invention. Only bases 
outside the 5' and 3' bases of the subject sequence, as displayed by the 
FASTDB alignment, which are not matched/aligned with the query sequence 
are calculated for the purposes of manually adjusting the percent identity 
score. 
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[00125] For example, a 90 base subject sequence is aligned to a 100 base query 
sequence to determine percent identity. The deletions occur at the 5' end of 
the subject sequence and, therefore, the FASTDB alignment does not show a 
malalignment of the first 10 bases at the 5' end. The 10 unpaired bases 
represent 10% of the sequence (number of bases at the 5' and 3' ends not 
matched/total number of bases in the query sequence), so 10% is subtracted 
from the percent identity score calculated by the FASTDB program. If the 
remaining 90 bases were perfectly matched the final percent identity would be 
90%. In-another example, a 90 base subject sequence is compared with a 100 
base query sequence. This time the deletions are internal, so that there are no 
bases on the 5 1 or 3' ends of the subject sequence which are not 
matched/aligned with the query. In this case, the percent identity calculated 
by FASTDB is not manually corrected. Once again, only bases 5* and 3' of the 
subject sequence which are not matched/aligned with the query sequence are 
manually corrected for. No other manual corrections are to be made for the 
purposes of the present invention. 
[00126] The invention also includes host cells comprising any of the nucleic 
acid molecules and/or vectors described herein. As used herein, a host cell is 
any prokaryotic or eukaryotic organism that is a recipient of a replicable 
expression vector, cloning vector or any nucleic acid molecule. As used 
herein, the terms "host," "host cell," "recombinant host" and "recombinant host 
cell" may be used interchangeably. Representative host cells that may be used 
with the invention include, but are not limited to, bacterial cells, yeast cells, 
plant cells and animal cells. Preferred bacterial host cells include Escherichia 
spp. cells (particularly E. coli cells and most particularly E. coli strains 
DH10B, Stbl2, DH5, DB3, DB3.1 (preferably E. coli LIBRARY 
EFFICIENCY® DB3.1™ Competent Cells; Invitrogen Corporation, Carlsbad, 
CA), DB4 and DB5 (see U.S. Application No. 09/518,188, filed March 2, 
2000, the disclosure of which is incorporated by reference herein in its 
entirety), Bacillus spp. cells (particularly B. subtilis and B. megaterium cells), 
Streptomyces spp. cells, Erwinia spp. cells, Klebsiella spp. cells, Serratia spp. 
cells (particularly S. marcessans cells), Pseudomonas spp. cells (particularly P. 
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aeruginosa cells), and Salmonella spp. cells (particularly S. typhimurium and 
S. typhi cells). Preferred animal host cells include insect cells (most 
particularly Drosophila melanogaster cells, Spodoptera frugiperda Sf9 and 
Sf21 cells and Trichoplusa High-Five cells), nematode cells (particularly C 
elegans cells), avian cells, amphibian cells (particularly Xenopus laevis cells), 
reptilian cells, and mammalian cells (most particularly NIH3T3, CHO, COS, 
VERO, BHK and human cells). Preferred yeast host cells include 
Saccharomyces cerevisiae cells and Pichia pastoris cells. These and other 
suitable bost cells are available commercially, for example from Invitrogen 
Corporation (Carlsbad, California), American Type Culture Collection 
(Manassas, Virginia), and Agricultural Research Culture Collection (NRRL; 
Peoria, Illinois). 

[00127] The nucleic acid molecules and/or vectors of the invention may be 
introduced into host cells using well known techniques of infection, 
transduction, electroporation, transfection, and transformation. The nucleic 
acid molecules and/or vectors of the invention may be introduced alone or in 
conjunction with other the nucleic ^cid molecules and/or vectors and/or 
proteins, peptides or RNAs. Alternatively, the nucleic acid molecules and/or 
vectors of the invention may be introduced into host cells as a precipitate, such 
as a calcium phosphate precipitate, or in a complex with a lipid. 
Electroporation also may be used to introduce the nucleic acid molecules 
and/or vectors of the invention into a host. Likewise, such molecules may be 
introduced into chemically competent cells such as E. coli. If the vector is a 
virus, it may be packaged in vitro or introduced into a packaging cell and the 
packaged virus may be transduced into cells. Hence, a wide variety of 
techniques suitable for introducing the nucleic acid molecules and/or vectors 
of the invention into host cells are well known and routine to those of skill in 
the art. Such techniques are reviewed at length, for example, in Sambrook, J., 
et al., Molecular Cloning, a Laboratory Manual, 2nd Ed., Cold Spring Harbor, 
NY: Cold Spring Harbor Laboratory Press, pp. 16.30-16.55 (1989), Watson, 
J.D., et aL, Recombinant DNA, 2nd Ed., New York: W.H. Freeman and Co., 
pp. 213-234 (1992), and Winnacker, E.-L., From Genes to Clones, New York: 
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that detail these techniques and which are incorporated by reference herein in 
their entireties for their relevant disclosures. 
[00128] The present invention also includes methods of producing a 
polynucleotide construct that encodes a fusion protein that comprises one or 
more amino acid sequence tags. Such methods may be accomplished in vivo 
(e.g., within a cell) or in vitro (outside a cell). 
[00129] According to one embodiment, the invention includes a method of 
producing a polynucleotide construct that encodes a fusion protein that 
comprises one or more amino acid sequence tags, said method comprising: (a) 
obtaining a first nucleic acid molecule comprising (i) a nucleotide sequence of 
interest and (ii) at least a first recombination site; (b) obtaining a second 
nucleic acid molecule comprising (i) one or more nucleic acid sequences 
which encode one or more amino acid sequence tags, and (ii) at least a second 
recombination site; and (c) combining said first nucleic acid molecule with 
said second nucleic acid molecule under conditions sufficient to cause 
recombination of at least said first and second recombination sites thereby 
producing a polynucleotide construct that encodes a fusion protein that 
comprises one or more amino acid sequence tags. 
[00130] In certain embodiments, the methods of the invention comprise: (a) 
obtaining a first nucleic acid molecule comprising a nucleotide sequence of 
interest flanked by at least a first and at least a second recombination sites that 
do not recombine with each other, (b) obtaining a second nucleic acid 
molecule comprising: (i) at least a third and fourth recombination sites that do 
not recombine with each other; and (ii) one or more nucleic acid sequences 
which encode one or more amino acid sequence tags; and (c) contacting said 
first nucleic acid molecule with said second nucleic acid molecule under 
conditions favoring recombination between said first and third and between 
said second and fourth recombination sites, thereby producing a product 
polynucleotide construct; wherein said product polynucleotide construct 
encodes a fusion protein comprising: (i) said amino acid sequence tag; and (ii) 
the amino acid sequence encoded by said nucleotide acid sequence of interest. 
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00131] In other embodiments, the methods of the invention comprise: (a) 
obtaining a first nucleic acid molecule comprising a nucleotide sequence of 
interest; (b) obtaining a second nucleic acid molecule comprising at least two 
topoisomerase recognition sites, at least one topoisomerase, and at least one 
nucleic acid sequence which encodes one or more amino acid sequence tags; 
(c) mixing said first nucleic acid molecule with said second nucleic acid 
molecule; and (d) incubating said mixture under conditions such that said first 
nucleic acid molecule is inserted into said second nucleic acid molecule 
between said at least two topoisomerase recognition sites, thereby producing a 
product polynucleotide construct; wherein said product polynucleotide 
construct encodes a fusion protein comprising: (i) said amino acid sequence 
tag; and (ii) the amino acid sequence encoded by said nucleotide sequence of 
interest. 

[001321 In other embodiments, the methods of the invention comprise: (a) 
obtaining a first nucleic acid molecule comprising a nucleotide sequence of 
interest; (b) obtaining a second nucleic acid molecule comprising (i) at least a 
first topoisomerase recognition site, flanked by (ii) at least a first 
recombination site, and (iii) at least a second topoisomerase recognition site 
flanked by (iv) at least a second recombination site, wherein said first and 
second recombination sites do not recombine with each other, and (v) at least 
one topoisomerase; (c) obtaining a third nucleic acid molecule comprising: (i) 
at least a third and fourth recombination sites that do not recombine with each 
other; and (ii) one or more nucleic acid sequences which encode one or more 
amino acid sequence tags; (d) mixing said first nucleic acid molecule with said 
second nucleic acid molecule; (e) incubating said mixture under conditions 
such that said first nucleic acid molecule is inserted into said second nucleic 
acid molecule between said at least two topoisomerase recognition sites, 
thereby producing a first product polynucleotide construct; (f) contacting said 
first product polynucleotide construct with said third nucleic acid molecule 
under conditions favoring recombination between said first and third and 
between said second and fourth recombination sites, thereby producing a 
second product polynucleotide construct; wherein said second product 
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polynucleotide construct encodes a fusion protein comprising: (i) said amino 
acid sequence tag; and (ii) the amino acid sequence encoded by said 
nucleotide sequence of interest. 
[00133] In particular embodiments of the invention, one or more of the nucleic 
acid molecules that are used in the practice of the methods will further 
comprise a nucleic acid sequence that encodes an amino acid sequence that is 
capable of being cleaved by one or more proteases, and wherein the product 
polynucleotide constructs encode a fusion protein comprising: (i) said amino 
acid sequence that is capable of being cleaved by one or more proteases, 
flanked on one side by (ii) an amino acid sequence tag, and on the other side 
by (iii) the amino acid sequence encoded by a nucleotide sequence of interest. 
Any of the amino acid sequences that are capable of being cleaved by one or 
more proteases, as described elsewhere herein, can be used with the methods 
of the invention. In a preferred embodiment, the amino acid sequence that is 
capable of being cleaved by one or more proteases is an amino acid sequence 
that is capable of being cleaved by enterokinase. 
[00134] The methods of the invention involve the use of nucleic acid molecules 
comprising one or more nucleic acid sequences which encode one or more 
amino acid sequence tags. Any of the nucleic acid sequences, described 
elsewhere herein, which encode an amino acid sequence tag, can be used in 
the context of the methods of the invention. In certain embodiments of the 
invention, the amino acid sequence tag is an amino acid sequence that is 
capable of being post-translationatly modified. For example, the amino acid 
sequence tag may be an amino acid sequence that is capable of being 
biotinylated. 

[00135] Any of the nucleic acid molecules, vectors, and host cells described 
herein, including any variations or modifications of such nucleic acid 
molecules vectors, and host cells, can be included in the practice of the 
methods of the invention. The nucleic acid molecules that are used in the 
practice of the methods of the invention may be linear, or circular. If a linear 
nucleic acid molecule is used, the ends of the molecule may be blunt ended or, 
alternatively, may have one or more overhang ends. The nucleic acid 
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molecules that are used in the practice of the methods of the invention may be 
PCR products. 

[00136] The methods of the invention may further comprise inserting a product 

polynucleotide construct into a host cell. 
[00137] In certain embodiments, the methods of the invention comprise 
contacting a first nucleic acid molecule comprising a first and a second 
recombination site with a second nucleic acid molecule comprising a third and 
a fourth recombination site under conditions favoring recombination between 
a first and third and between a second and fourth recombination sites. 
[00138] Exemplary recombination sites included within the nucleic acid 
molecules that are used in the practice of the methods of the invention include, 
but are not limited to, (a) attB sites, (b) atiP sites, (c) atth sites, (d) attR sites, 
(e) lox sites, (f) psi sites, (g) dif sites, (h) cer sites, (i) fit sites, and mutants, 
variants, and derivatives of the recombination sites of (a), (b), (c), (d), (e), (f), 
(g), (h), or (i) which retain the ability to undergo recombination. 
[00139] In particular embodiments, said first and said second nucleic acid 
molecules are combined in the presence of at least one recombination protein. 
Exemplary recombination proteins that can be used in the methods of the 
invention include, e.g., Cre, Int, IHF, Xis, Fis, Hin, Gin, Cin, Tn3 resolvase, 
TndX, XerC and XerD. 
[00140] Methods for combining nucleic acid molecules by recombination at 
particular sites are known in the art. Such methods include, e.g., 
recombinational cloning methods. 
[00141] Cloning systems that utilize recombination at defined recombination 
sites have been previously described in U.S. Patent Nos. 5,888,732, 6,143,557, 
6,171,861, 6,270,969, and 6,277,608, and in commonly owned, co-pending 
U.S. Application No. 10/005,876 (filed 12/07/01), which are specifically 
incorporated herein by reference. In brief, the Gateway™ Cloning System, 
described in this application and the applications referred to in the related 
applications section, utilizes vectors that contain at least one and preferably at 
least two different site-specific recombination sites based on the bacteriophage 
lambda system (e. g., att\ and at(l) that are mutated from the wild type (attO) 
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sites. Each mutated site has a unique specificity for its cognate partner att site 
of the same type (for example atiBl with attVl, or attU with atfRl) and will 
not cross-react with recombination sites of the other mutant type or with the 
wild-type attO site. Nucleic acid fragments flanked by recombination sites are 
cloned and subcloned using the Gateway™ system by replacing a selectable 
marker (for example, ccdB) flanked by att sites on the recipient plasmid 
molecule, sometimes termed the Destination Vector. Desired clones are then 
selected by transformation of a ccdB sensitive host strain and positive 
selection for a marker on the recipient molecule. Similar strategies for 
negative selection (e.g., use of toxic genes) can be used in other organisms 
such as thymidine kinase (TK) in mammals and insects. 
[00142] Mutating specific residues in the core region of the att site can generate 
a large number of different att sites. As with the attl and attl sites utilized in 
Gateway™, each additional mutation potentially creates a novel att site with 
unique specificity that will recombine only with its cognate partner att site 
bearing the same mutation and will not cross-react with any other mutant or 
wild-type att site. Novel mutated att .sites (e. g., attB 1-10, attP 1-10, attR 
1-10 and attL 1-10) are described in International Patent Application 
PCT/USOO/05432, which is specifically incorporated herein by reference. 
Other recombination sites having unique specificity (i.e., a first site will 
recombine with its corresponding site and will not recombine or not 
substantially recombine with a second site.having a different specificity) may 
be used to practice the present invention. Examples of suitable recombination 
sites include, but are not limited to, lox? sites and derivatives such as fo*P51 1 
(see U.S. Patent No. 5,851,808), fi t sites and derivatives, dif sites and 
derivatives,^ sites and derivatives and cer sites and derivatives. The present 
invention provides novel methods using such recombination sites to join or 
link multiple nucleic acid molecules or segments and more specifically to 
clone such multiple segments into one or more vectors containing one or more 
recombination sites (such as any Gateway™ Vector including Destination 
Vectors). 
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[00143] In certain embodiments, the methods of the invention comprise (a) 
mixing a first nucleic acid molecule with a second nucleic acid molecule, said 
second nucleic acid molecule comprising at least two topoisomerase 
recognition sites and at least one topoisomerase, and (b) incubating the 
mixture under conditions such that said first nucleic acid molecule is inserted 
into said second nucleic acid molecule between said at least two 
topoisomerase recognition sites. 
[00144] Methods for inserting a first nucleic acid molecule into a second 
nucleic acid molecule between topoisomerase recognition sites thereby 
producing a product polynucleotide construct, are known in the art. 
Exemplary methods are known in the art as Topoisomerase cloning, TOPO® 
cloning, and Directional TOPO® cloning. As used herein, the term 
"topoisomerase-mediated cloning" is intended to mean any method of 
combining two or more nucleic acid molecules using at least one 
topoisomerase recognition site on one or more of the nucleic acid molecules 
and one or more topoisomerase. Exemplary methods are described in 
commonly owned, co-pending U.S. 'Application No. 10/005,876 (filed 
12/07/01), the disclosure of which is incorporated herein by reference in its 
entirety. 

[00145] A method for generating a product polynucleotide construct using 
topoisomerase cloning can be performed, for example, by contacting a first 
nucleic acid molecule having a first end and a second end, wherein, at the first 
end or second end or both, the first nucleic acid molecule has a topoisomerase 
recognition site (or cleavage product thereof) at or near the 3 f terminus; at least 
a second nucleic acid molecule having a first end and a second end, wherein, 
at the first end or second end or both, the at least second double stranded 
nucleotide sequence has a topoisomerase recognition site (or cleavage product 
thereof) at or near a 3 1 terminus; and at least one site specific topoisomerase 
(e.g., a type IA and/or a type IB topoisomerase), under conditions such that all 
components are in contact and the topoisomerase can effect its activity. 
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[001461 In one embodiment, the method is performed by contacting a first 
nucleic acid molecule and a second (or other) nucleic acid molecule, each of 
which has a topoisomerase recognition site, or a cleavage product thereof, at 
the 3' termini or at the 5' termini of two ends to be covalently linked. In 
another embodiment, the method is performed by contacting a first nucleic 
acid molecule having a topoisomerase recognition site, or cleavage product 
thereof, at the 5' terminus and the 3' terminus of at least one end, and a second 
(or other) nucleic acid molecule having a 3' hydroxyl group and a 5' hydroxyl 
group at the end to be linked to the end of the first nucleic acid molecule 
containing the recognition sites. As disclosed herein, the methods can be 
performed using any number of nucleic acid molecules having various 
combinations of termini and ends. 
[00147] Method of the invention may involve the use of nucleic acid molecule 
that comprises at least one topoisomerase. The topoisomerase may be, e.g., a 
type I topoisomerase. More specifically, the type I topoisomerase may be a 
type IB topoisomerase. Where a type IB topoisomerase is used, the type IB 
topoisomerase may be a topoisomerase selected, e.g., from the group 
consisting of eukaryotic nuclear type I topoisomerase and a poxvirus 
topoisomerase. Poxvirus topoisomerases may be produced by or isolated from 
a virus selected from the group consisting of vaccinia virus, Shope fibroma 
virus, ORF virus, fowlpox virus, molluscum contagiosum virus and Amsacta 
moorei entomopoxvirus. 
[00148] The present invention includes methods for producing a polynucleotide 
construct that encodes a fusion protein that comprises one or more amino acid 
sequence tags, using, for example, recombinational cloning or topoisomerase- 
mediated cloning. The methods of the invention may also involve the use of a 
combination of recombinational cloning and topoisomerase-mediated cloning. 
For example, the invention includes methods comprising the successive use of 
one or more recombinational cloning steps followed by one or more 
topoisomerase-mediated cloning steps. Alternatively, the invention also 
includes methods comprising the successive use of one or more 
topoisomerase-mediated cloning steps followed by one or more 
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recombinational cloning steps. Alternatively, the invention includes methods 
comprising the use of recombinational cloning and topoisomerase-mediated 
cloning in the same cloning step. 
)0149] One example of the use of topoisomerase-mediated cloning followed 
by recombinational cloning to produce a polynucleotide construct that encodes 
a fusion protein capable of being post-translationally modified or that is 
capable of being recognized by an antibody (or fragment thereof) or other 
specific binding reagent, is as follows. A first nucleic acid molecule 
comprising a nucleotide sequence of interest is mixed with a second nucleic 
acid molecule comprising: (i) at least a first topoisomerase recognition site 
flanked by (ii) at least a first recombination site, and (iii) at least a second 
topoisomerase recognition site flanked by (iv) at least a second recombination 
site, wherein said first and second recombination sites do not recombine with 
each other, and (v) at least one topoisomerase. The mixture is incubated under 
conditions such that said first nucleic acid molecule is inserted into said 
second nucleic acid molecule between said at least two topoisomerase 
recognition sites, thereby producing a first product polynucleotide construct 
The first product polynucleotide construct is then brought into contact with a 
third nucleic acid molecule comprising: (i) at least a third and fourth 
recombination sites that do not recombine with each other and (ii) one or more 
nucleic acid sequences which encode one or more amino acid sequence tags. 
The first product polynucleotide construct is contacted with said third nucleic 
acid molecule under conditions favoring recombination between said first and 
third and between said second and fourth recombination sites, thereby 
producing a second product polynucleotide construct. According to this 
exemplary method, said second polynucleotide construct will encode a fusion 
protein comprising: (i) said amino acid sequence tag, and (ii) the amino acid 
sequence encoded by said nucleotide sequence of interest. 
[001501 Another example of the use of topoisomerase-mediated cloning 
followed by recombinational cloning to produce a polynucleotide construct 
that encodes a fusion protein that comprises an amino acid sequence tag, is as 
follows: A first nucleic acid molecule comprising a nucleotide sequence of 
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interest is mixed with a second nucleic acid molecule comprising: (i) at least a 
first topoisomerase recognition site flanked by (ii) at least a first 
recombination site, and (iii) at least a second topoisomerase recognition site 
flanked by (iv) at least a second recombination site, wherein said first and 
second recombination sites do not recombine with each other, (v) one or more 
nucleic acid sequences which encode one or more amino acid sequence tags, 
and (vi) at least one topoisomerase. The mixture is incubated under conditions 
such that said first nucleic acid molecule is inserted into said second nucleic 
acid molecule between said at least two topoisomerase recognition sites, 
thereby producing a first product polynucleotide construct. The first product 
polynucleotide construct is then brought into contact with a third nucleic acid 
molecule comprising: (i) at least a third and fourth recombination sites that do 
not recombine with each other. The first product polynucleotide construct is 
contacted with said third nucleic acid molecule under conditions favoring 
recombination between said first and third and between said second and fourth 
recombination sites, thereby producing a second product polynucleotide 
construct. According to this exemplary method, said second polynucleotide 
construct will encode a fusion protein comprising: (i) said amino acid 
sequence tag, and (ii) the amino acid sequence encoded by said nucleotide 
sequence of interest. 

[001511 The invention also includes host cells comprising one or more 
polynucleotide construct that encodes a fusion protein, e.g., a fusion protein 
that comprises one or more amino acid sequence tags, wherein said 
polynucleotide construct is produced according to a method of the invention. 
[001521 The nucleic acid molecules and methods of the invention can be used, 
e.g-, to produce a fusion protein comprising one or more amino acid sequence 
tags, and an amino acid sequence encoded by a nucleic acid sequence of 
interest. Accordingly, the present invention includes methods for producing 
fusion proteins comprising one or more amino acid tags. The methods of the 
invention can be used to produce fusion proteins in vitro or in vivo. When in 
vivo methods are used, the fusion protein can be produced in either eukaryotic 
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or prokaryotic cells. Methods for producing proteins in vivo and in vitro are 
well known in the art. 
[00153] According to certain embodiments, the invention provides methods for 
producing a fusion protein that comprises one or more amino acid sequence 
tags, said methods comprising: (a) obtaining a host cell comprising a 
polynucleotide construct that encodes a fusion protein that comprises one or 
more amino acid sequence tags, said polynucleotide construct produced 
according to a method of the invention; and (b) culturing said host cell under 
conditions wherein said fusion protein is produced by said host cell. The 
precise conditions for producing a fusion protein in a host cell will vary, 
depending on the host cell used and the nature of the fusion protein being 
produced, and will be appreciated by those of ordinary skill in the art. In 
certain embodiments, the methods of the invention further comprise culturing 
said host cell under conditions wherein said fusion protein is post- 
translationally modified in said host cell. For example, the fusion protein may 
be biotinylated in said host cell. 
[001541 In yet other embodiments, the methods may further comprise causing 
said fusion protein to be released from said host cell or treating said host cell 
such that said fusion protein is released from said host cell; and (b) contacting 
said fusion protein with a detecting composition comprising a molecule that is 
capable of interacting with said fusion protein. In an exemplary embodiment, 
the fusion protein will be a post-translationally modified fusion protein, e.g., a 
biotinylated fusion protein, and said detecting composition will comprise 
avidin or an avidin analogue (including e.g., streptavidin). 
[00155] Methods for treating a host cell such that a protein, produced therein, is 
released from said host cell, are well known in the art and include, e.g. y 
chemical disruption of the cell and physical disruption of the cell including, 
e.g., boiling, freezing, grinding, and combinations of chemical and physical 
disruption of the cell. Such methods include producing a protein extract from 
said host cell. 

[00156] Details regarding the production and detection of fusion proteins that 
comprise one or more amino acid sequence tags, in general, are known in the 
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art. (See, e.g., Parrott, M.B. and Barry, MA, Biochem. Biophys. Res. Comm. 
257:993-1000 (2001), Parrott, M.B. and Barry, M.A., Mol. Ther. 7:96-104 
(2000), U.S. Patent No. 5,252,466, and references cited therein). 
[00157] The invention also includes methods for purifying, isolating or 
concentrating fusion proteins that are produced using the compositions and 
methods of the invention. In one embodiment, the invention includes methods 
for purifying, isolating or concentrating fusion proteins that have been post- 
translationaUy modified by a post-translational modification reaction, either in 
vivo or in vitro. In another embodiment, the invention includes methods for 
purifying, isolating or concentrating fusion proteins that comprise an ammo 
acid sequence that is capable of being recognized by one or more antibody (or 
fragment thereof) or other specific reagents. 
[001581 In an exemplary embodiment, the fusion proteins of the invention are 
purified, isolated or concentrated by bringing the fusion proteins into contact 
with a composition that is capable of interacting with the amino acid sequence 
tag and/or with a molecular entity that is attached to the amino acid sequence 
tag Such compositions that interact specifically with an amino acid sequence 
tag include, e.g., "detecting compositions." As used herein, the term "detecting 
composition" is intended to mean any composition comprising a molecule that 
is capable of interacting with an amino acid sequence tag or with a molecular 
entity that is attached to an amino acid sequence tag, e.g., a molecule that is 
capable of interacting with a molecular entity that was attached to the amino 
acid sequence tag in a post-translational modification reaction. Such 
molecules that interact with amino acid sequence tags include, e.g., proteins 
and polypeptides, including, e.g., antibodies (or fragments thereof including 
fab fragments, fc fragments, etc) specific for the amino acid sequence tag. 
Particular exemplary molecules that can be attached to a detecting 
composition include avidin, streptavidin, and derivatives and analogs of those 
two compounds, as well as metal compounds (e.g., arsenites and thalUum) that 
bind to dithiols such as lipoic acid (U.S. Patent No. 5,252,466), and antibodies 
(or fragments thereof) specific for epitopes such as, e.g., the FLAG epitope, 
the Myc epitope, the HA epitope, etc. 
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[00159] Detecting compositions may further comprise a surface (including, 
e.g., a solid and semi-solid surface), a matrix or a substrate, to which the 
molecule that is capable of interacting with particular amino acid sequence tag 
(or molecular entity attached thereto) is attached. Exemplary surfaces, 
matrices and substrates include, e.g., agarose beads, plastic beads, microscope 
coverslips, microscope slides, magnetic beads, glass beads or planar surfaces. 
The attachment may be, e.g., covalent or non-covalenf The types of surfaces, 
matrices and substrates to which a molecule that is capable of interacting with 
an amino -acid sequence tag (or molecular entity attached thereto) may be 
attached are known in the art (see, e.g., Zou, H. et ah, J. Biochem. Biophys. 
Methods 49:1-3:199-240 (2001), Zusman, R. and Zusman, I., J. Biochem. 
Biophys. Methods 49:1-3:175-187 (2001)). Exemplary detecting compositions 
include agarose beads to which avidin, streptavidin, or derivatives/analogs 
thereof, are attached. 

[00160] In certain embodiments, the detecting composition may be used to 
identify, concentrate or purify a fusion protein by, e.g., mixing the detecting 
composition with a solution or composition comprising the fusion protein of 
interest, wherein the mixing takes place in batch (e.g., in a vessel such as a 
beaker, flask, bottle, test tube, petri dish, or other suitable container) or 
through a column containing the detecting composition. The detecting 
composition may alternatively be applied to a solution, to a cell (e.g., a 
permeablized cell), or to any other substance that is known to contain or 
suspected of containing the fusion protein of interest. 
[00161] hi certain embodiments, the fusion proteins of the invention will be 
post-translationally modified fusion proteins, e.g, fusion proteins that have 
been biotinylated at the amino acid sequence tag. The biotinylated fusion 
protein can be purified, isolated or concentrated from a mixture of other 
proteins and molecules by bringing the biotinylated fusion protein into contact 
with, e.g, a detecting composition comprising a molecule that specifically 
interacts with biotin. Such molecules include, e.g., avidin and avidin 
derivatives such as streptavidin. The detecting composition may further 
comprise a surface or support matrix that can be physically removed from a 
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mixture of proteins and other molecules, e.g., agarose beads, or other 
equivalent beads. 

[00162] In other embodiments, the fusion protein that is produced using the 
methods and compositions of the invention will comprise an amino acid 
sequence that is capable of being cleaved by one or more proteases, flanked on 
one side by an amino acid sequence tag, and on the other side by an amino 
acid sequence encoded by a nucleic acid sequence of interest. After purifying, 
isolating or concentrating such a fusion protein, the fusion protein can be 
treated with a protease to separate the amino acid sequence tag from the amino 
acid sequence encoded by a nucleic acid sequence of interest. 
[00163] The invention also includes compositions or reaction mixtures 
comprising one or more nucleic acid molecule of the invention. The 
compositions or reaction mixtures may additionally comprise, one or more 
additional components selected from the group consisting of one or more 
topoisomerases, one or more host cells (e.g., host cells that may be competent 
for uptake of nucleic acid molecules) one or more recombination proteins, one 
or more vectors, one or more nucleotides, one or more primers, and one or 
more polypeptides having polymerase activity. 
[00164] The invention also provides kits comprising the isolated nucleic acid 
molecules of the invention, which may optionally comprise one or more 
additional components selected from the group consisting of one or more 
topoisomerases, one or more recombination proteins, one or more vectors, one 
or more nucleotides, one or more primers, one or more polypeptides having 
polymerase activity, one or more host cells {e.g., host cells that may be 
competent for uptake of nucleic acid molecules), one or more antibody (or 
fragment thereof), and one or more detecting compositions, including, e.g., 
one or more support matrices complexed with avidin or an avidin analog. 
[00165] It will be readily apparent to one of ordinary skill in the relevant arts 
that other suitable modifications and adaptations to the methods and 
applications described herein are obvious and may be made without departing 
from the scope of the invention or any embodiment thereof. Having now 
described the present invention in detail, the same will be more clearly 
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understood by reference to the following examples, which are included 
herewith for purposes of illustration only and are not intended to be limiting of 
the invention. 

EXAMPLE 1 

A Gateway™-Adapted Destination Vector for Cloning and Expression of 
Biotinylated Fusion Proteins in E. coli 

[00166] This example describes the pET104-DEST expression vector (Fig. 1). 

pET104-DEST is a 7.6 kb vector adapted for use with the Gateway™ 

Technology, and is designed to allow for high-level, inducible expression of 

biotinylated recombinant fusion proteins in E. coli using the pET system. 

Biotinylated recombinant protein may then be easily detected or immobilized 

to a solid support for other downstream applications. 
[00167] The pET system was originally developed by Studier and colleagues 

and takes advantage of the high activity and specificity of the bacteriophage 

T7 RNA polymerase to allow regulated expression of heterologous genes in E. 

coli from the T7 promoter (Rosenberg, A.H. et al, Gene 56:125-135 (1987); 

Studier, F.W. and Moffatt, B.A., J. Mol. Biol. 189: 11 3-1 30 (1986); Studier, 

F.W. et al, Meth. Enzymol. 755:60-89 (1990)). 
[00168] The pET104-DEST vector comprises the following elements: 

(a) T7/ac promoter for high-level, IPTG-inducible expression of the 
gene of interest in E. coli (Dubendorff, J.W., and Studier, F.W., J. 
Mol. Biol. 219:45-59 (1991); ); Studier, F.W. et al, Meth. 
Enzymol. 755:60-89 (1990)); 

(b) Biotag™ to allow biotinylation of the recombinant protein of 
interest for easy detection or use in other applications; 

(c) Enterokinase (EK) recognition site for cleavage of the Biotag™ 
from the recombinant protein; 

(d) Two recombination sites, atfRl and aftR2, downstream of the 
CMV promoter for recombinational cloning of the gene of interest 
from an entry clone; 
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(e) Chloramphenicol resistance gene (CmR) located between the two 
attR sites for counterselection; 

(f) The ccdB gene located between the attR sites for negative 

selection; 

(g) lad gene encoding the lac repressor to reduce basal transcription 
from the lilac promoter in the pET104-DEST vector and from the 
lacUVS promoter in the E. coli chromosome; 

(h) Ampicillin resistance gene for selection in E. coli; and 

(i) pBR322 origin for low-copy replication and maintenance of the 
plasmid in E. coli. 

[00169] The control plasmid, pET104/GW//acZ (Fig. 2), can be used as a 
positive control for expression in E. coli. P ET104/GW//acZ was generated 
using the Gateway LR recombination reaction between an entry clone 
containing the lacZ gene and pET104-DEST. 
[00170] To recombine a gene of interest into pET104-DEST, an entry clone 
containing a gene of interest will be obtained. Details relating to choosing an 
entry vector and constructing an entry .clone are available in the art {See, e.g., 
U.S. Patent No. 6,270,969). 
[00171] pET104-DEST is an N-terminal fusion vector and contains an ATG 
initiation codon. A Shine-Dalgarno ribosome binding site (RBS) is included 
upstream of the initiation. The gene of interest in the entry clone must: (a) be 
in frame with the N-terminal Biotag™ after recombination; and (b) contain a 
stop codon. 

[00172] The entry clone will contain, e.g., attL sites flanking the gene of 
interest. Genes in an entry clone are transferred to the destination vector 
backbone by mixing the DNAs with, e.g., the Gateway LR Clonase Enzyme 
Mix. The resulting LR recombination reaction is then transformed into E. coli 
(e.g., TOP10 or DH5a-TlR) and the expression clone is selected using 
ampicillin. Recombination between the attR sites on the destination vector 
and the attL sites on the entry clone replaces the chloramphenicol (CmR) gene 
and the ccdB gene with the gene of interest and results in the formation of attB 
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sites in the expression clone. Details for setting up the recombination reaction, 
transforming E. coli, and selecting for the expression clone, are available in 
the art. 

[00173] The recombination region of the expression clone resulting from 
pET104-DEST x entry clone is depicted in Fig. 11. Features of the 
recombination region are as follows: 

(a) shaded regions correspond to those DNA sequences transferred 
from the entry clone into the pET104-DEST vector by 
recombination. Non-shaded regions are derived from the 
pET104-DEST vector; 

(b) bases 568 and 2230 ofthepET104-DEST sequence are marked. 

(c) The biotin binding site is labeled with an asterisk (*). 

[00174] The Expression clone can be confirmed following recombination. The 
ccdB gene mutates at a very low frequency, resulting in a very low number of 
false positives. True expression clones will be ampicillin-resistant and 
chloramphenicol-sensitive. Transformants containing a plasmid with a 
mutated ccdB gene will be both ampicillin- and chloramphenicol-resistant. To 
check a putative expression clone, transformants can be tested for growth on 
LB plates containing 30 |xg/ml chloramphenicol. A true expression clone 
should not grow in the presence of chloramphenicol. 

[00175] The expression construct may also be sequenced to confirm that the 
gene of interest is in frame with the Biotag™. The priming sites indicated in 
Fig. 1 1 can be used to sequence the insert. 

[00176] Expression of the recombinant fusion protein can be induced by first 
transforming the expression clone into an appropriate E. coli strain for protein 
expression, e.g., BL21 cells. The transformant is then grown to mid-log in LB 
containing 100 jxg/ml ampicillin or 50 ng/ml carbenicillin, and IPTG is added 
to a final concentration of 0.5-1 mM. 

[00177] Expression of the recombinant fusion protein can be detected, e.g., by 
western blot analysis using, e.g., streptavidin-HRP or streptavidin-AP 
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conjugates, or an antibody (or fragment thereof) specific for the protein of 
interest. 

[00178] The recombinant fusion protein can then be purified. The presence of 
the N-terminal Biotag™ in pET104-DEST allows the recombinant fusion 
protein to be biotinylated. Once biotinylated, the recombinant fusion protein 
can be purified by taking advantage of the strong association between biotin 
and avidin (and its analogs including streptavidin). For example, streptavidin 
agarose-conjugated beads can be used to purify the recombinant fusion 
protein. Other streptavidin conjugates can also be used. 
[00179] A streptavidin-agarose resin can be used for affinity purification of 
recombinant fusion proteins containing the Biotag™. The resin can be 
constructed by covalently linking streptavidin to cross-linked agarose beads 
via a 15-atom hydrophilic spacer arm specifically designed to reduce non- 
specific binding and to ensure optimal binding of biotinylated molecules. 
Streptavidin is bound to a final concentration of 2-3 mg streptavidin per ml of 
packed resin. 

[00180] Recombinant fusion proteins may be purified with streptavidin-agarose 
under native or denaturing conditions. Methods for purifying biotinylated 
proteins are known in the art. 

[00181] pET104-DEST contains an enterokinase (EK) recognition site to allow 
removal of the Biotag™ from the recombinant fusion protein, if desired. After 
digestion with enterokinase, 11 amino acids will remain at the N-terminus of 
the protein (see Fig. 11). Methods for digestion with enterokinase are known 
in the art. 

EXAMPLE 2 

Directional TOPO Cloning of Blunt-End PCR Products into a Vector for 
Biotinylated Expression in E. coli 

[00182] This example describes directional TOPO cloning using the 
pET104/D-TOPO vector (Fig. 3). 
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[00183] pET104/D-TOPO is a 5.9 kb vector designed to facilitate rapid, 
directional TOPO cloning of blunt-end PCR products for regulated and 
biotinylated expression in E. colu The pET104/D-TOPO vector comprises the 
following elements: 

(a) T7lac promoter for high-level, IPTG-inducible expression of the 
gene of interest in E. coli (Dubendorfif, J ? W., and Studier, F.W., J. 
Mol Biol 219:45-59 (1991); ); Studier, F.W. et ai 9 Meth. Enzymol 
755:60-89 (1990)); 

(b) Directional TOPO cloning site for rapid and efficient directional 
cloning of blunt-end PCR products; 

(c) Biotag™ to allow biotinylation of the recombinant protein of 
interest for easy detection or use in other applications; 

(d) Enterokinase (EK) recognition site for cleavage of the Biotag™ 
from the recombinant protein; 

(e) lad gene encoding the lac repressor to reduce basal transcription 
from the lilac promoter in the pET104/D-TOPO vector and from 
the lacUV5 promoter in the E. coli chromosome; 

(£) Ampicillin resistance gene for selection in E. coli; and 
(g) pBR322 origin for low-copy replication and maintenance of the 
plasmid in E, coli. 

[00184] The control plasmid, pET104/D//acZ (Fig. 4), can be used as a positive 
control for expression in E. colu The gene encoding fi-galactosidase was 
directionally TOPO cloned into the pET104/D-TOPO vector. 

[00185] Topoisomerase I from Vaccinia virus binds to duplex DNA at specific 
sites and cleaves the phosphodiester backbone after 5'-CCCTT in one strand 
(Shuman, S., Proa Natl Acad. Scu USA 55:10104-10108 (1991)). The energy 
from the broken phosphodiester backbone is conserved by formation of a 
covalent bond between the 3' phosphate of the cleaved strand and a tyrosyl 
residue (Tyr-274) of topoisomerase L The phospho-tyrosyl bond between the 
DNA and enzyme can subsequently be attacked by the 5' hydroxyl of the 
original cleaved strand, reversing the reaction and releasing topoisomerase 
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(Shuman, S, J. Biol. Chem. 2(5P:32678-32684 (1994)). TOPO cloning 
exploits this reaction to efficiently clone PCR products. 
[00186] Directional joining of double-strand DNA using TOPO-charged 
oligonucleotides occurs by adding a 3' single-stranded end (overhang) to the 
incoming DNA (Cheng, C. and Shuman, S., Mol. Cell Biol. 20:8059-8068 
(2000)). This single-stranded overhang is identical to the 5' end of the TOPO- 
charged DNA fragment. A 4 nucleotide overhang sequence has been added to 
the TOPO-charged DNA and the TOPO system has been adapted to a "whole 
vector" format. 

[00187] In this system, PCR products are directionally cloned by adding four 
bases to the forward primer (CACC). The overhang in the cloning vector 
(GTGG) invades the 5' end of the PCR product, anneals to the added bases, 
and stabilizes the PCR product in the correct orientation (see Fig. 12). Inserts 
can be cloned in the correct orientation with efficiencies equal to or greater 
than 90%. 

[00188] The general steps required to clone and express a blunt-end PCR 

product are illustrated in Fig. 13. 
[00189] The following factors should be considered when designing the 

forward PCR primer: 

(a) To enable directional cloning, the forward PCR primer must 
contain the sequence, CACC, at the 5' end of the primer. The 4 
nucleotides, CACC, base pair with the overhang sequence, 
GTGG, in the pET104/D-TOPO vector. 

(b) To include the N-terminal Biotag™, it is important that the 
forward PCR primer be designed such that the gene of interest is 
in frame with the Biotag™. The initiation ATG codon is not 
needed. A Shine-Dalgarno ribosome binding site (RBS) is 
included upstream of the ATG in the N-terminal tag to ensure 
optimal spacing for proper translation initiation. 

(c) At least six non-native amino acids will be present between the 
EK cleavage site and the start of the gene of interest. 
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(d) If it is desired to express the protein with a native N-terminus 
(z.e., with out the Biotag™), the forward PCR primer should be 
designed to include: (i) a stop codon to terminate the Biotag™, 
and (ii) a second ribosome binding site (AGGAGG) 9-10 base 
pairs 5' of the initial ATG codon of the protein. 
[00190] The following factors should be considered when designing the reverse 
PCR primer: 

(a) It is important to include a stop codon in the reverse primer or the 
reverse primer should be designed to hybridize downstream of the 
native stop codon, 

(b) To ensure that the PCR product clones directionally with high 
efficiency, the reverse PCR primer must not be complementary to 
the overhang sequence GTGG at the 5' end. A one base pair 
mismatch can reduce the directional cloning efficiency from 90% 
to 75%, and may increase the chances of the open reading frame 
cloning in the opposite orientation. 

[00191] The diagram depicted in Fig. 14 is useful for designing suitable PCR 

primers to clone an express a PCR product using pET104/D-TOPO. The 

biotin binding site is designated with an asterisk (*). 
[00192] Once a desired PCR product has been produced, it can then be TOPO 

cloned into the pET104/D-TOPO vector. The recombinant vector can then be 

transformed into an appropriate E. coli strain. 
[00193] It has been found that inclusion of salt (e.g., 250 mM NaCl, 10 mM 

MgCl 2 ) in the TOPO cloning reaction may result in an increase in the number 

of transformants. Therefore, it is recommended that salt be added to the 

TOPO cloning reaction. 
[00194] Table m describes how to set up a TOPO cloning reaction (6 for 

eventual transformation into either chemically competent E. coli or 

electrocompetent E. coli. 

TABLE m 
Setting up a TOPO Cloning Reaction 
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Chemically competent 
E. coli 


Electrocompetenti?. co/i 


P«rpQh PCR nroduct 


0.5 to 4.0 ul 


0.5 to 4.0 ^1 


Salt solution 






Sterile water 


Add to a final volume of 
5 |d 


Add to a final volume of 
5ul 


TOPO vector 


_1_ 


1»J 



[00195] Mix reaction gently and incubate for 5 minutes at room temperature 
(22-23°C). For most applications, 5 minutes will yield sufficient colonies for 
analysis. Depending on the circumstances, the length of the TOPO cloning 
reaction can be varied from 30 seconds to 30 minutes. For routine subcloning 
of PCR products, 30 seconds may be sufficient. For large PCR products (>1 
kb) or if a pool of PCR products is being cloned, increasing the reaction time 
may yield more colonies. 
[00196] Place the reaction on ice or store the TOPO cloning reaction at -20°C 

overnight. 

[00197] Once the TOPO cloning reaction has been performed, the pET104/D- 
TOPO construct will be transformed into competent E. coli. Methods for 
transforming E. coli with nucleic acids are known in the art. 

[00198] Transformants can be analyzed by isolating plasmid DNA from 
transformant colonies. The isolated plasmid DNA can be checked by 
restriction analysis to confirm the presence and correct orientation of the 
insert. Additionally, the construct can be sequenced to confirm that the gene 
of interest is in frame with the N-terminal Biotag™. Forward and T7 reverse 
primers can be used to sequence the insert. Positive transformants can also be 
analyzed by PCR. 

[00199] Expression of the recombinant fusion protein can be induced by first 
transforming the expression clone into an appropriate E. coli strain for protein 
expression, e.g., BL21 cells. The transformant is then grown to mid-log in LB 
containing 100 ug/ml ampicillin or 50 ug/ml caxbenicillin, and IPTG is added 
to a final concentration of 0.5-1 mM. 
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[00200] Expression of the recombinant fusion protein can be detected, e.g., by 
western blot analysis using, e.g., streptavidin-HRP or streptavidin-AP 
conjugates, or an antibody (or fragment thereof) specific for the protein of 
interest. 

[00201] The recombinant fusion protein can then be purified. The presence of 
the N-terminal Biotag™ in pET104/D-TOPO allows the recombinant fusion 
protein to be biotinylated. Once biotinylated, the recombinant fusion protein 
can be purified by taking advantage of the strong association between biotin 
and avidin (and its analogs including streptavidin). For example, streptavidin 
agarose-conjugated beads can be used to purify the recombinant fusion 
protein. Other streptavidin conjugates can also be used. 
[00202] A streptavidin-agarose resin can be used for affinity purification of 
recombinant fusion proteins containing the Biotag™. The resin can be 
constructed by covalently linking streptavidin to cross-linked agarose beads 
via a 15-atom hydrophilic spacer arm specifically designed to reduce non- 
specific binding and to ensure optimal binding of biotinylated molecules. 
Streptavidin is bound to a final concenttation of 2-3 mg streptavidin per ml of 
packed resin. 

[00203] Recombinant fusion proteins may be purified with streptavidin-agarose 
under native or denaturing conditions. Methods for purifying biotinylated 
proteins are known in the art. 

[00204] pET104/D-TOPO contains an enterokinase (EK) recognition site to 
allow removal of the Biotag™ from the recombinant fusion protein, if desired. 
After digestion with enterokinase, 6 amino acids will remain at the N-terminus 
of the protein {see Fig. 14). Methods for digestion with enterokinase are 
known in the art. 

EXAMPLE 3 

A Gateway-Adapted Destination Vector for Cloning and Expression of Biotinylated 

Fusion Proteins in Mammalian Cells 
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[002051 This example describes the pcDNA/Biotag™-DEST vector (Fig. 5). 
pcDNA6/Biotag™-DEST is a 7.0 kb vector adapted for use with the Gateway 
Technology, and is designed to allow high-level expression of biotinylated 
recombinant fusion proteins in mammalian cells. Biotinylated recombinant 
protein may then be easily detected or immobilized to a solid support for other 
downstream applications. 

[00206] The pcDNA6/Biotag™-DEST vector contains the following elements: 

(a) The human cytomegalovirus (CMV) immediate early 
enhancer/promoter for high level constitutive expression of the 
gene of interest in a wide range of mammalian cells 
(Andersson, S. et al, J. Biol. Chem. 26*8222-8229 (1989); 
Boshart, M. et ah, Cell 4/-.521-530 (1985); Nelson, J.A. et al, 
Molec. Cell Biol. 7:4125-4129(1987)); 

(b) Biotag™ to allow biotinylation of the recombinant protein of 

interest for easy detection or use in other applications. 

(c) Enterokinase (EK) recognition site for cleavage of the Biotag™ 

from the recombinant protein; 

(d) Two recombination sites, attRl and attRl, downstream of the 
CMV promoter for recombinational cloning of the gene of 
interest from an entry clone; 

(e) Chloramphenicol resistance gene (CmR) located between the 
two attR sites for counterselection; 

(f) The ccdB gene located between the attR sites for negative 
selection; 

(g) Blasticidin (bsd) resistance gene for selection of stable cell 

lines using blasticidin; 

(h) Ampicillin resistance gene for selection in E. coli; and 

(i) pUC origin for high-copy replication and maintenance of the 
plasmid in E. coli. 
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[00207] The control plasmid, pcDNA6/Biotag TM -GW//acZ (Fig. 6), can be used 
as a positive control for transfection and expression in the mammalian cell line 
of choice. pcDNA6/Biotag™-GW//acZ was generated using the Gateway LR 
recombination reaction between an entry clone containing the lacZ gene and 
pcDNA6/Biotag™-DEST. 
[00208] To recombine a gene of interest into pcDNA6/Biotag™-DEST, an 
entry clone containing the gene of interest must first be obtained. Details 
relating to choosing an entry vector and constructing an entry clone are 
available in the art {See, e.g., U.S. Patent No. 6,270,969). 
[00209] pcDNA6/Biotag™-DEST is an N-terminal fusion vector and contains 
an ATG initiation codon in the context of a Kozak consensus sequence to 
ensure optimal translation initiation. The gene of interest in the entry clone 
must: (a) be in frame with the N-terminal Biotag™ after recombination; and 
(b) contain a stop codon. 
[00210] The entry clone will contain, e.g., attL sites flanking the gene of 
interest. Genes in an entry clone are transferred to the destination vector 
backbone by mixing the DNAs with, e.g., the Gateway LR Clonase Enzyme 
Mix. The resulting LR recombination reaction is then transformed into E. coli 
(e.g., TOP10 or DH5a-TlR) and the expression clone is selected using 
ampicillin. Recombination between the attR sites on the destination vector 
and the attL sites on the entry clone replaces the chloramphenicol (CmR) gene 
and the ccdB gene with the gene of interest and results in the formation of attB 
sites in the expression clone. Details for setting up the recombination reaction, 
transforming E. coli, and selecting for the expression clone, are available in 
the art 

[00211] The recombination region of the expression clone resulting from 
pcDNA6/Biotag™-DEST x entry clone is depicted in Fig. 15. Features of the 
recombination region are as follows: 

(a) shaded regions correspond to those DNA sequences transferred 
from the entry clone into the pcDNA6/Biotag™-DEST vector by 
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recombination. Non-shaded regions are derived from the 
pcDNA6/Biotag™-DEST vector, 

(b) bases 1191 and 2853 of the pcDNA6/Biotag™-DEST sequence are 
marked. 

(c) The biotin binding site is labeled with an asterisk (*). 

(d) Potential stop codons are underlined. 

[00212] The Expression clone can be confirmed following recombination. The 
ccdB gene mutates at a very low frequency, resulting in a very low number of 
false positives. True expression clones will be ampicillin-resistant and 
chloramphenicol-sensitive. Transformants containing a plasmid with a 
mutated ccdB gene will be both ampicillin- and chloramphenicol-resistant. To 
check a putative expression clone, transformants can be tested for growth on 
LB plates containing 30 ug/ml chloramphenicol. A true expression clone 
should not grow in the presence of chloramphenicol. 
[002131 The expression construct may also be sequenced to confirm that the 

gene of interest is in frame with the Biotag™. The priming sites indicated in 

Fig. 15 can be used to sequence the insert. 
[002141 Before expression of the recombinant fusion protein can be induced, 

the expression clone must first be transfected into the mammalian cells of 

choice. Methods for transfecting mammalian cells are known in the art. 

Exemplary methods of transfection include calcium phosphate, lipid-mediated, 

and electroporation. Following transfection, a stable cell line can be 

generated. 

[00215] Expression of the recombinant fusion protein can be assayed from 
either transiently transfected cells or stable cell lines. Expression of the 
recombinant fusion protein can be detected, e.g., by western blot analysis 
using, e.g., streptavidin-HRP or streptavidin-AP conjugates, or an antibody (or 
fragment thereof) specific for the protein of interest. 

[00216] The recombinant fusion protein can then be purified. The presence of 
the N-terminal Biotag™ in pcDNA6/Biotag™-DEST allows the recombinant 
fusion protein to be biotinylated. Once biotinylated, the recombinant fusion 
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protein can be purified by taking advantage of the strong association between 
biotin and avidin (and its analogs including streptavidin). For example, 
streptavidin agarose-conjugated beads can be used to purify the recombinant 
fusion protein. Other streptavidin conjugates can also be used. 
[00217] A streptavidin-agarose resin can be used for affinity purification of 
recombinant fusion proteins containing the Biotag™. The resin can be 
constructed by covalently linking streptavidin to cross-linked agarose beads 
via a 15-atom hydrophilic spacer arm specifically designed to reduce non- 
specific binding and to ensure optimal binding of biotinylated molecules. 
Streptavidin is bound to a final concentration of 2-3 mg streptavidin per ml of 
packed resin. 

[00218] Recombinant fusion proteins may be purified with streptavidin-agarose 
under native or denaturing conditions. Methods for purifying biotinylated 
proteins are known in the art. 

[00219] pcDNA6/Biotag™-DEST contains an enterokinase (EK) recognition 
site to allow removal of the Biotag™ from the recombinant fusion protein, if 
desired. After digestion with enterokinase, 12 amino acids will remain at the 
N-terminus of the protein (see Fig. 15). Methods for digestion with 
enterokinase are known in the art. 

EXAMPLE 4 

Directional TOPO Cloning of Blunt-End PCR Products into a Vector for 
Biotinylated Expression in Mammalian Cells 

[00220] This example describes directional TOPO cloning using the 
pcDNA6/Biotag™/D-TOPO vector (Fig. 7). 

[00221] pcDNA6/Biotag™/D-TOPO is a 5.3 kb expression vector designed to 
facilitate rapid directional cloning of blunt-end PCR products for high-level 
expression and biotinylation in mammalian cells. Biotinylated recombinant 
protein may then be easily detected or immobilized to a solid support for other 
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downstream applications. The pcDNA6/Biotag™/D-TOPO vector comprises 

the following elements: 

(a) The human cytomegalovirus (CMV) immediate early 
enhancer/promoter for high level constitutive expression of the 
gene of interest in a wide range of mammalian cells (Andersson, S. 
et al, J. Biol Chem. 26*8222-8229 (1989); Boshart, M. et al, 
Cell 47:521-530 (1985); Nelson, J.A. et al, Molec. Cell Biol. 
7:4125-4129 (1987)); 

(b) Biotag™ to allow biotinylation of the recombinant protein of 
interest for easy detection or use in other applications; 

(c) Enterokinase (EK) recognition site for cleavage of the Biotag™ 
from the recombinant protein; 

(d) TOPO cloning site for rapid and efficient directional cloning of 
blunt-end PCR products; 

(e) Blasticidin (bsd) resistance gene for selection of stable cell lines 

using blasticidin. 

[00222] The control plasmid, pcDNA6/Biotag™//acZ (Fig. 8), can be used as a 
positive control for expression in E. coli. The gene encoding p-galactosidase 
was directionally TOPO cloned into the pcDNA6/Biotag™/D-TOPO vector. 

[00223] The theory behind topoisomerase cloning is described under Example 
2, supra. 

[00224] The general steps required to clone and express a hlunt-end PCR 

product are illustrated in Pig. 16. 
[00225] The following factors should be considered when designing the 

forward PCR primer: 

(e) To enable directional cloning, the forward PCR primer must 
contain the sequence, CACC, at the 5' end of the primer. The 4 
nucleotides, CACC, base pair with the overhang sequence, 
GTGG, in the pcDNA6/Biotag™/D-TOPO vector. 

(f) To include the N-terminal Biotag™, it is important that the 
forward PCR primer be designed such that the gene of interest is 
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in frame with the Biotag™. The initiation ATG codon is not 
needed. 

(g) If it is desired to express the protein with a native N-terminus 
(i.e., with out the Biotag™), the forward PCR primer should be 
designed to include: (i) a stop codon to terminate the Biotag™, 
and (ii) the ATG initiation codon within the context of a Kozak 
consensus sequence to ensure optimal translation initiation. 
[00226] The following factors should be considered when designing the reverse 
PCR primer 

(c) It is important to include a stop codon in the reverse primer or the 
reverse primer should be designed to hybridize downstream of the 
native stop codon. 

(d) To ensure that the PCR product clones directionally with high 
efficiency, the reverse PCR primer must not be complementary to 
the overhang sequence GTGG at the 5' end. A one base pair 
mismatch can reduce the directional cloning efficiency from 90% 
to 75%, and may increase the chances of the open reading frame 
cloning in the opposite orientation. 

[002271 The diagram depicted in Fig. 17 is useful for designing suitable PCR 

primers to clone an express a PCR product using pcDNA6/Biotag™/D-TOPO. 

The biotin binding site is designated with an asterisk (*). 
[00228] Once a desired PCR product has been produced, it can then be TOPO 

cloned into the pcDNA6/Biotag™/D-TOPO vector. The recombinant vector 

can then be transformed into an appropriate E. coli strain. 
[00229] It has been found that inclusion of salt (e.g., 250 mM NaCl, 10 mM 

MgCh) hi the TOPO cloning reaction may result in an increase in the number 

of transformants. Therefore, it is recommended that salt be added to the 

TOPO cloning reaction. 
[00230] Table IV describes how to set up a TOPO cloning reaction (6 ul) for _ 

eventual transformation into either chemically competent E. coli or 

electrocompetent E. coli. 
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T ABLE IV 
Setting up a TOPO Cloning Reaction 



Reagents 


Chemically competent 
E. coli 


Electrocompetenti?. coli 


Fresh PCR product 


0.5 to 4.0 ul 


0.5 to 4.0 ul 


Salt solution 


lul 




Sterile water 


Add to a final volume of 
5ul 


Add to a final volume of 
5ul 


TOPO vector 


1 ul 


1 ul 



[00231] Mix reaction gently and incubate for 5 minutes at room temperature 
(22-23°C). For most applications, 5 minutes will yield sufficient colonies for 
analysis. Depending on the circumstances, the length of the TOPO cloning 
reaction can be varied from 30 seconds to 30 minutes. For routine subcloning 
of PCR products, 30 seconds may be sufficient. For large PCR products (>1 
kb) or if a pool of PCR products is being cloned, increasing the reaction time 
may yield more colonies. 

[00232] Place the reaction on ice or store the TOPO cloning reaction at -2Q°C 

overnight. 

[00233] Once the TOPO cloning reaction has been performed, 
pcDNA6/Biotag™/D-TOPO construct will be transformed into competent E. 
coli. Methods for transforming E. coli with nucleic acids are known in the art. 

[00234] Transfonnants can be analyzed by isolating plasmid DNA from 
transformant colonies. The isolated plasmid DNA can be checked by 
restriction analysis to confirm the presence and correct orientation' of the 
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insert. Additionally, the construct can be sequenced to confirm that the gene 
of interest is in frame with the N-terminal Biotag™. Forward and T7 reverse 
primers can be used to sequence the insert. Positive transformants can also be 
analyzed by PCR. 

(002351 Before expression of the recombinant fusion protein can be induced, 
the expression clone must first be transfected into the mammalian cells of 
choice. Methods for transfecting mammalian cells are known in the art. 
Exemplary methods of transfection include calcium phosphate, lipid-mediated, 
and electroporation. Following transfection, a stable cell line can be 
generated. 

[002361 Expression of the recombinant fusion protein can be assayed from 
either transiently transfected cells or stable cell lines. Expression of the 
recombinant fusion protein can be detected, e.g., by western blot analysis 
using, e.g., streptavidin-HRP or streptavidin-AP conjugates, or an antibody (or 
fragment thereof) specific for the protein of interest. 

[00237] The recombinant fusion protein can then be purified. The presence of 
the N-terminal Biotag™ in pcDNA6/Biotag™/D-TOPO allows the 
recombinant fusion protein to be biotinylated. Once biotinylated, the 
recombinant fusion protein can be purified by taking advantage of the strong 
association between biotin and avidin (and its analogs including streptavidin). 
For example, streptavidin agarose-conjugated beads can be used to purify the 
recombinant fusion protein. Other streptavidin conjugates can also be used. 

[00238] A streptavidin-agarose resin can be used for affinity purification of 
recombinant fusion proteins containing the Biotag™. The resin can be 
constructed by covalently linking streptavidin to cross-linked agarose beads 
via a 15-atom hydrophilic spacer arm specifically designed to reduce non- 
specific binding and to ensure optimal binding of biotinylated molecules. 
Streptavidin is bound to a final concentration of 2-3 mg streptavidin per ml of 
packed resin. 
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[00239] Recombinant fusion proteins may be purified with streptavidin-agarose 
under native or denaturing conditions. Methods for purifying biotinylated 
proteins are known in the art. 

[00240] pcDNA6/Biotag™/D-TOPO contains an enterokinase (EK) recognition 
site to allow removal of the Biotag™ from the recombinant fusion protein, if 
desired. After digestion with enterokinase, 13 amino acids will remain at the 
N-terminus of the protein (see Fig. 17). Methods for digestion with 
enterokinase are known in the art. 

EXAMPLE 5 

A Gateway™-Adapted Destination Vector for the Stable Expression of Biotinylated 
Fusion Proteins in Drosophila Schneider 2 Cells 

[00241] This example describes the pMT/Biotag™-DEST vector (Fig. 9). 
pMT/Biotag™-DEST is a 5.4 kb vector adapted for use with the Gateway 
Technology, and is designed to allow high-level expression of biotinylated 
recombinant fusion proteins in Drosophila Schneider 2 (S2) cells. 
Biotinylated recombinant protein may then be easily detected or immobilized 
to a solid support for other downstream applications. 

[00242] The pMT/Biotag™-DEST vector contains the following elements: 

(a) The Drosophila metallothionein (MT) promoter for high-level, 
metal-inducible expression of a gene of interest in S2 cells. 

(b) Biotag™ to allow biotinylation of the recombinant protein of 
interest for easy detection or use in other applications. 

(c) Two recombination sites, attRl and a«R2, downstream of the MT 
promoter for recombinational cloning of the gene of interest form 
an entry clone. 

(d) Chloramphenicol resistance gene (CmR) located between the atiR 
sites for counterselection. 

(e) The ccdB gene located between the atfR sites for negative 
selection. 
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(f) pUC origin for high-copy replication and maintenance of the 
plasmid in E. coli. 

(g) Ampicillin resistance gene for selection in E. coli. 

[00243] The control plasmid, pMT/Biotag™/GW-/acZ (Fig. 10), can be used as 
a positive control for transfection and expression in the mammalian cell line of 
choice. pMT/Biotag™/GW-/acZ was generated using the Gateway LR 
recombination reaction between an entry clone containing the lacZ gene and 
pMT/Biotag™-DEST. 

[00244] To recombine a gene of interest into pMT/Biotag™-DEST, an entry 
clone containing the gene of interest must first be obtained. Details relating to 
choosing an entry vector and constructing an entry clone are available in the 
art (See, e.g., U.S. Patent No. 6,270,969). 

[00245] pMT/Biotag™-DEST is an N-terminal fusion vector and contains an 
ATG initiation codon. The gene of interest in the entry clone must: (a) be in 
frame with the N-terminal Biotag™ after recombination; and (b) contain a stop 
codon. 

[00246] The entry clone will contain, e.g., atiL sites flanking the gene of 
interest. Genes in an entry clone are transferred to the destination vector 
backbone by mixing the DNAs with, e.g., the Gateway LR Clonase Enzyme 
Mix. The resulting LR recombination reaction is then transformed into E. coli 
(e.g., TOP10 or DH5a-TlR) and the expression clone is selected using 
ampicillin. Recombination between the attR sites on the destination vector 
and the atiL sites on the entry clone replaces the chloramphenicol (CmR) gene 
and the ccdB gene with the gene of interest and results in the formation of attB 
sites in the expression clone. Details for setting up the recombination reaction, 
transforming E. coli, and selecting for the expression clone, are available in 
the art. 

[00247] The recombination region of the expression clone resulting from 
pMT/Biotag™-DEST x entry clone is depicted in Fig. 18. Features of the 
recombination region are as follows: 
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(e) shaded regions correspond to those DNA sequences transferred 
from the entry clone into the pMT/Biotag™-DEST vector by 
recombination. Non-shaded regions are derived from the 
pMT/Biotag™-DEST vector; 

(f) bases 1135 and 2797 of the pMT/Biotag™-DEST sequence are 
marked. 

(g) The biotin binding site is labeled with an asterisk (*). 

(h) Potential stop codons are underlined. 

[00248] The basic steps needed to clone and express a protein using 
pMT/Biotag™-DEST are as follows: 

(a) Establish a culture of S2 cells from supplied frozen stock. 

(b) Choose a Gateway entry vector and generate an entry clone 
containing the gene of interest. 

(c) Perform an LR recombination reaction between the entry clone 
containing the gene of interest and the pMT/Biotag™-DEST 
vector. Transform E. coli and select for the expression clone. 

(d) Isolate plasmid DNA. 

(e) Transiently transfect S2 cells. 

(f) Induce, if necessary, and assay for expression of the protein. 

(g) Create stable cell lines expressing the protein of interest by 
cotransfecting the recombinant expression vector with a selection 
vector, pCoHygro (Fig. 19) or pCoBlast (Fig. 20), and select with 
the appropriate concentration of hygromycin-B or blasticidin, 
respectively. 

(h) Induce if necessary, and assay for expression of the protein. 

(i) Scale up expression, if desired. 

[002491 Expression of the recombinant fusion protein can be detected, e.g., by 
western blot analysis using, e.g., streptavidin-HRP or streptavidin-AP 
conjugates, or an antibody (or fragment thereof) specific for the protein of 
interest. 
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[00250] The recombinant fusion protein can then be purified. The presence of 
the N-terminal Biotag™ in pMT/Biotag™-DEST allows the recombinant 
fusion protein to be biotinylated. Once biotinylated, the recombinant fusion 
protein can be purified by taking advantage of the strong association between 
biotin and avidin (and its analogs including streptavidin). For example, 
streptavidin agarose-conjugated beads can be used to purify the recombinant 
fusion protein. Other streptavidin conjugates can also be used. 

[00251] A streptavidin-agarose resin can be used for affinity purification of 
recombinant fusion proteins containing the Biotag™. The resin can be 
constructed by covalently linking streptavidin to cross-linked agarose beads 
via a 15-atom hydrophilic spacer arm specifically designed to reduce non- 
specific binding and to ensure optimal binding of biotinylated molecules. 
Streptavidin is bound to a final concentration of 2-3 mg streptavidin per ml of 
packed resin. 

[00252] Recombinant fusion proteins may be purified with streptavidin-agarose 
under native or denaturing conditions. Methods for purifying biotinylated 
proteins are known in the art. 

[00253] pMT/Biotag™-DEST contains an enterokinase (EK) recognition site to 
allow removal of the Biotag™ from the recombinant fusion protein, if desired. 
After digestion with enterokinase, 11 amino acids will remain at the N- 
terminus of the protein {see Fig. 18). Methods for digestion with enterokinase 
are known in the art. 

[00254] Having now fully described the present invention in some detail by 
way of illustration and example for purposes of clarity of understanding, it 
will be obvious to one of ordinary skill in the art that the same can be 
performed by modifying or changing the invention within a wide and 
equivalent range of conditions, formulations and other parameters without 
affecting the scope of the invention or any specific embodiment thereof, and 
that such modifications or changes are intended to be encompassed within the 
scope of the appended claims. 
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[00255] All publications, patents and patent applications mentioned in this 
specification are indicative of the level of skill of those skilled in the art to 
which this invention pertains, and are herein incorporated by reference to the 
same extent as if each individual publication, patent or patent application was 
specifically and individually indicated to be incorporated by reference. 
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WHAT IS CLAIMED IS: 

1 . An isolated nucleic acid molecule comprising: 

(a) one or more recombination sites; and 

(b) one or more nucleic acid sequences which encode an amino acid 
sequence tag. 

2. The isolated nucleic acid molecule of claim 1, further comprising at least 
one additional nucleic acid sequence selected from the group consisting of 
a selectable marker, a cloning site, a restriction site, a promoter, an 
operator, an operon, a nucleotide sequence encoding a gene product which 
allows for negative selection, an origin of replication, a nucleotide 
sequence which encodes a repressor of at least one promoter, and a gene or 
partial gene. 

3. The isolated nucleic acid molecule of claim 1, wherein a nucleic acid 
sequence of interest can be inserted ,at or within 20 nucleotides of said one 
or more recombination sites, thereby producing a polynucleotide construct 
that encodes a fusion protein, said fusion protein comprising: (i) said 
amino acid sequence tag; and (ii) the amino acid sequence encoded by said 
nucleic acid sequence of interest. 

4. The isolated nucleic acid molecule of claim 1, further comprising a nucleic 
acid sequence that encodes an amino acid sequence that is capable of being 
cleaved by one or more proteases. 

5. The isolated nucleic acid molecule of claim 4, wherein said amino acid 
sequence that is capable of being cleaved by one or more proteases is an 
amino acid sequence that is capable of being cleaved by enterokinase. 

6. The isolated nucleic acid molecule of claim 4, wherein a nucleic acid 
sequence of interest can be inserted at or within 20 nucleotides of said one 
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or more recombination sites thereby producing a polynucleotide construct 
that encodes a fusion protein, said fusion protein comprising: (i) said 
amino acid sequence that is capable of being cleaved by one or more 
proteases, flanked on one side by (ii) said amino acid tag, and on the other 
side by (iii) the amino acid sequence encoded by said nucleic acid 
sequence of interest. 

7. The nucleic acid molecule of claim 1, wherein said amino acid sequence 
tag is an amino acid sequence that is capable of being post-translationally 
modified. 

8. The isolated nucleic acid molecule of claim 7, wherein said amino acid 
sequence that is capable of being post-translationally modified is an amino 
acid sequence that is capable of being post-translationally modified by 
biotinylation, attachment of 4-phosphopanthetheine, attachment of lipoic 
acid or attachment of flavins. 

9. The isolated nucleic acid molecule of claim 7, wherein said amino acid 
sequence that is capable of being post-translationally modified is an amino 
acid sequence that is capable of being biotinylated. 

10. The isolated nucleic acid molecule of claim 9, wherein said amino acid 
sequence that is capable of being biotinylated is all or a portion of the 
Klebsiella pneumoniae oxalacetate decarboxylase a subunit, all or a 
portion of the Propionibacterium shermanii transcarboxylase 1.3S subunit, 
or all or a portion of the Escherichia coli biotin carboxyl carrier protein 
component of acetyl-CoA carboxylase. 

11. The isolated nucleic acid molecule of claim 9, wherein said amino acid 
sequence that is capable of being biotinylated is a portion of the C- 
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tenninus of the Klebsiella pneumoniae oxalacetate decarboxylase a 
subunit. 

12. The isolated nucleic acid molecule of claim 11, wherein said amino acid 
sequence that is capable of being biotinylated is the BIOTAG™. 

13. The isolated nucleic acid molecule of claim 1, wherein said nucleic acid 
molecule is a circular molecule. 

14. The isolated nucleic acid molecule of claim 1, wherein said nucleic acid 
molecule comprises two or more recombination sites. 

15. The isolated nucleic acid molecule of claim 1, wherein said recombination 
sites are selected from the group consisting of: (a) attB sites, (b) att? sites, 
(c) attL sites, (d) attR sites, (e) lox sites, (f) psi sites, (g) dif sites, (h) cer 
sites, (i) frt sites, and mutants, variants, and derivatives of the 
recombination sites of (a), (b), (c), (d), (e), (f), (g), (h), or (i) which retain 
the ability to undergo recombination. 

16. A vector comprising the isolated nucleic acid molecule of claim 1. 

17. A host cell comprising the isolated nucleic acid molecule of claim 1 . 

18. A host cell comprising the vector of claim 16. 

19. A method of producing a polynucleotide construct that encodes a fusion 
protein that comprises an amino acid sequence tag, said method 
comprising: 

(a) obtaining a first nucleic acid molecule comprising a nucleotide 
sequence of interest flanked by at least a first and at least a second 
recombination sites that do not recombine with each other; 
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(b) obtaining a second nucleic acid molecule comprising: (i) at least a 
third and fourth recombination sites that do not recombine with each 
other; and (ii) one or more nucleic acid sequences which encode an 
amino acid sequence tag; and 

(c) contacting said first nucleic acid molecule with said second nucleic 
acid molecule under conditions favoring recombination between said 
first and third and between said second and fourth recombination sites, 
thereby producing a product polynucleotide construct; 

Wherein said product polynucleotide construct encodes a fusion protein 
comprising: (i) said amino acid sequence tag; and (ii) the amino acid 
sequence encoded by said nucleotide acid sequence of interest 

20. The method of claim 19, wherein said second nucleic acid molecule 
further comprises a nucleic acid sequence that encodes an amino acid 
sequence that is capable of being cleaved by one or more proteases; and 

wherein said product polynucleotide construct encodes a fusion protein 
comprising: (i) said amino acid sequence that is capable of being cleaved 
by one or more proteases, flanked on one side by (ii) said amino acid 
sequence tag, and on the other side by (hi) the amino acid sequence 
encoded by said nucleotide sequence of interest. 

21. The method of claim 20, wherein said amino acid sequence that is capable 
of being cleaved by one or more proteases is an amino acid sequence that 
is capable of being cleaved by enterokinase. 

22. The method of claim 19, wherein said amino acid sequence tag is an amino 
acid sequence that is capable of being post-translationally modified. 

23. The method of claim 22, wherein said amino acid sequence that is capable 
of being post-translationally modified is an amino acid sequence that is 
capable of being post-translationally modified by biotinylation, attachment 
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of 4-phosphopanthetheine, attachment of lipoic acid or attachment of 
flavins. 

24. The method of claim 22, wherein said amino acid sequence that is capable 
of being post-translationally modified is an amino acid sequence that is 
capable of being biotinylated. 

25. The method of claim of claim 24, wherein said amino acid sequence that is 
capable of being biotinylated is all or a portion of the Klebsiella 
pneumoniae oxalacetate decarboxylase a summit, all or a portion of the 
Propionibacterium shermanii transcarboxylase 1.3S subunit, or all or a 
portion of the Escherichia coli biotin carboxyl carrier protein component 
of acetyl-CoA carboxylase. 

26. The method of claim of claim 24, wherein said amino acid sequence that is 
capable of being biotinylated is a portion of the C-terminus of the 
Klebsiella pneumoniae oxalacetate decarboxylase a subunit. 

27. The method of claim 26, wherein said amino acid sequence that is capable 
of being biotinylated is the BIOTAG™. 

28. The method of claim 19, wherein said second nucleic acid molecule is a 
vector. 

29. The method of claim 19, wherein said first nucleic acid molecule is a 
circular nucleic acid molecule. 

30. The method of claim 19, wherein said first nucleic acid molecule is a 
linear nucleic acid molecule. 
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31. The method of claim 30, wherein said first nucleic acid molecule is a PCR 
product 

32. The method of claim 19, further comprising inserting said product 
polynucleotide construct into a host cell. 

33. The method of claim 20, further comprising inserting said product 
polynucleotide construct into a host cell. 

34. The method of claim 19, wherein said second nucleic acid molecule 
comprises at least one additional nucleic acid sequence selected from the 
group consisting of a selectable marker, a cloning site, a restriction site, a 
promoter, an operator, an operon, a nucleotide sequence encoding a gene 
product which allows for negative selection, an origin of replication, a 
nucleotide sequence which encodes a repressor of at least one promoter, 
and a gene or partial gene. 

* 

35. The method of claim 19, wherein said first, second, third and fourth 
recombination sites are selected from the group consisting of: (a) attB 
sites, (b) attV sites, (c) attL sites, (d) attR sites, (e) lox sites, (f) psi sites, 
(g) dif sites, (h) cer sites, (i) frt sites, and mutants, variants, and derivatives 
of the recombination sites of (a), (b), (c), (d), (e), (f), (g), (h), or (i) which 
retain the ability to undergo recombination. 

36. The method of claim 19, wherein said first and said second nucleic acid 
molecules are combined in the presence of at least one recombination 
protein. 

37. The method of claim 36, wherein said recombination protein is selected 
from the group consisting of: (a) Cre, (b) Int, (c) IHF, (d) Xis, (e) Fis, (f) 
Hin, (g) Gin, (h) Cin, (i) Tn3 resolvase, G) TndX, (k) XerC, and (1) XerD. 
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38. The method of claim 36, wherein said recombination protein is Cre. 

39. An isolated nucleic acid molecule comprising: 

(a) one or more topoisomerase recognition sites and/or one or more 
topoisomerases; and 

(b) one or more nucleic acid sequences which encode an amino acid 
sequence tag. 

40. The isolated nucleic acid molecule of claim 39, further comprising at least 
one additional nucleic acid sequence selected from the group consisting of 
a selectable marker, a cloning site, a restriction site, a promoter, an 
operator, an operon, a nucleotide sequence encoding a gene product which 
allows for negative selection, an origin of replication, a nucleotide 
sequence which encodes a repressor of at least one promoter, and a gene or 
partial gene. 

41. The isolated, nucleic acid molecule of claim 39, wherein a nucleic acid 
sequence of interest can be inserted at or within 20 nucleotides of said one 
or more topoisomerase recognition sites and/or at or within 20 nucleotide 
of the position of said one or more topoisomerases, thereby producing a 
polynucleotide construct that encodes a fusion protein, said fusion protein 
comprising: (i) said amino acid sequence tag; and (ii) the amino acid 
sequence encoded by said nucleic acid sequence of interest. 

42. The isolated nucleic acid molecule of claim 39, further comprising a 
nucleic acid sequence that encodes an amino acid sequence that is capable 
of being cleaved by one or more proteases. 

43. The isolated nucleic acid molecule of claim 42, wherein said amino acid 
sequence that is capable of being cleaved by one or more proteases is an 
amino acid sequence that is capable of being cleaved by enterokinase. 



WO 2004/005482 



PCT/US2003/021339 



-86- 



44. The isolated nucleic acid molecule of claim 42, wherein a nucleic acid 
sequence of interest can be inserted at or within 20 nucleotides of said one 
or more topoisomerase recognition sites and/or at the position of said one 
or more topoisomerases thereby producing a polynucleotide construct that 
encodes a fusion protein, said fusion protein comprising: (i) said amino 
acid sequence that is capable of being cleaved by one or more proteases, 
flanked on one side by (ii) said amino acid tag, and on the other side by 
(iii) the amino acid sequence encoded by said nucleic acid sequence of 
interest. 

45. The isolated nucleic acid molecule of claim 39, wherein said amino acid 
sequence tag is an amino acid sequence that is capable of being post- 
translationally modified. 

46. The isolated nucleic acid molecule of claim 45, wherein said amino acid 
sequence that is capable of being post-translationally modified is an amino 
acid sequence that is capable of .being post-translationally modified by 
biotinyiation, attachment of 4-phosphopanthetheine, attachment of lipoic 
acid or attachment of flavins. 

47. The isolated nucleic acid molecule of claim 45, wherein said amino acid 
sequence that is capable of being post-translationally modified is an amino 
acid sequence that is capable of being biotinylated. 

48. The isolated nucleic acid molecule of claim 47, wherein said amino acid 
sequence that is capable of being biotinylated is all or a portion of the 
Klebsiella pneumoniae oxalacetate decarboxylase a subunit, all or a 
portion of the Propionibacterium shermanii transcarboxylase 1.3S subunit, 
or all or a portion of the Escherichia coli biotin carboxyl carrier protein 
component of acetyl-CoA carboxylase. 
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49. The isolated nucleic acid molecule of claim 47, wherein said amino acid 

sequence that is capable of being biotinylated is a portion of the C- 

terminus of the Klebsiella pneumoniae oxaiacetate decarboxylase a 

> 

subunit 

50. The isolated nucleic acid molecule of claim 49, wherein said amino acid 
sequence that is capable of being biotinylated is the BIOTAG™. 

51. The isolated nucleic acid molecule of claim 39, wherein said nucleic acid 
molecule is a circular molecule. 

52. The isolated nucleic acid molecule of claim 39, wherein said nucleic acid 
molecule comprises two or more recombination sites. 

53. The isolated nucleic acid molecule of claim 39, wherein said 
topoisomerase is a type I topoisomerase. 

54: The isolated nucleic acid molecule of claim 53, wherein said type I 
topoisomerase is a type IB topoisomerase. 

55. The isolated nucleic acid molecule of claim 54, wherein said type IB 
topoisomerase is selected from the group consisting of eukaryotic nuclear 
type I topoisomerase and a poxvirus topoisomerase. 

56. The isolated nucleic acid molecule of claim 55, wherein said poxvirus 
topoisomerase is produced by or isolated from a virus selected from the 
group consisting of vaccinia virus, Shope fibroma virus, ORF virus, 
fowlpox virus, molluscum contagiosum virus and Amsacta moorei 
entomopoxvirus. 

57. A vector comprising the isolated nucleic acid molecule of claim 39. 
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58. A host cell comprising the isolated nucleic acid molecule of claim 39. . 

59. A host cell comprising the vector of claim 57. 

60. A method of producing a polynucleotide construct that encodes a fusion 
protein that comprises an amino acid sequence tag, said method 
comprising: 

(a) obtaining a first nucleic acid molecule comprising a nucleotide 
sequence of interest; 

(b) obtaining a second nucleic acid molecule comprising at least two 
topoisomerase recognition sites, at least one topoisomerase, and at 
least one nucleic acid sequence which encodes an amino acid sequence 

(c) mixing said first nucleic acid molecule with said second nucleic acid 
molecule; and 

(d) incubating said mixture under. conditions such that said first nucleic 
acid molecule is inserted into said second nucleic acid molecule 
between said at least two topoisomerase recognition sites, thereby 
producing a product polynucleotide construct; 

wherein said product polynucleotide construct encodes a fusion protein 
comprising: (i) said amino acid sequence tag; and (ii) the amino acid sequence 
encoded by said nucleotide sequence of interest 

61. The method of claim 60, wherein said second nucleic acid molecule 
further comprises a nucleic acid sequence that encodes an amino acid 
sequence that is capable of being cleaved by one or more proteases; and 
wherein said product polynucleotide construct encodes a fusion protein 
comprising: (i) said amino acid sequence that is capable of being cleaved by 
one or more proteases, flanked on one side by (ii) said amino acid sequence 
tag, and on the other side by (iii) the amino acid sequence encoded by said 
nucleotide sequence of interest. 
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62. The method of claim 61, wherein said amino acid sequence that is capable 
of being cleaved by one or more proteases is an amino acid sequence that 
is capable of being cleaved by enterokinase. 

63. The method of claim 60, wherein said amino acid sequence tag is an amino 
acid sequence that is capable of being post-translationally modified. 

64. The method of claim 63, wherein said amino acid sequence that is capable 
of being post-translationally modified is an amino acid sequence that is 
capable of being post-translationally modified by biotinylation, attachment 
of 4-phosphopanthetheine, attachment of lipoic acid or attachment of 
flavins. 

65. The method of claim 63, wherein said amino acid sequence that is capable 
of being post-translationally modified is an amino acid sequence that is 
capable of being biotinylated. 

66. The method of claim of claim 65, wherein said amino acid sequence that is 
capable of being biotinylated is all or a portion of the Klebsiella 
pneumoniae oxalacetate decarboxylase a subunit, all or a portion of the 
Propionibacterium shermanii transcarboxylase 1.3S subunit, or all or a 
portion of the Escherichia coli biotin carboxyl carrier protein component 
of acetyl-CoA carboxylase. 

67. The method of claim of claim 65, wherein said amino acid sequence that is 
capable of being biotinylated is a portion of the C-tenninus of the 
Klebsiella pneumoniae oxalacetate decarboxylase a subunit. 



68. The method of claim 67, wherein said amino acid sequence that is capable 
of being biotinylated is the BIOTAG™. 
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69. The method of claim 60, wherein said second nucleic acid molecule is a 
vector. 

70. The method of claim 60, wherein said first nucleic acid molecule is a 
linear nucleic acid molecule. 

71. The method of claim 70, wherein said first nucleic acid molecule is a 
blunt-end nucleic acid molecule. 

72. The method of claim 60, wherein said first nucleic acid molecule is a PCR 
product. 

73. The method of claim 60, further comprising inserting said product 
polynucleotide construct into a host cell. 

74. The method of claim 61, further comprising inserting said product 
polynucleotide construct into a host cell. 

75. The method of claim 60, wherein said second nucleic acid molecule 
comprises at least one additional nucleic acid sequence selected from the 
group consisting of a selectable marker, a cloning site, a restriction site, a 
promoter, an operator, an operon, a nucleotide sequence encoding a gene 
product which allows for negative selection, an origin of replication, a 
nucleotide sequence which encodes a repressor of at least one promoter, 
and a gene or partial gene. 

76. The method of claim 60, wherein said topoisomerase is a type I 
topoisomerase. 

77. The method of claim 76, wherein said type I topoisomerase is a type IB 
topoisomerase. 
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78. The method of claim 77, wherein said type IB topoisomerase is selected 
from the group consisting of eukaryotic nuclear type I topoisomerase and a 
poxvirus topoisomerase. 

79. The method of claim 78, wherein said poxvirus topoisomerase is produced 
by or isolated from a virus selected from the group consisting of vaccinia 
virus, Shope fibroma virus, ORF virus, fowlpox virus, molluscum 
contagiosum virus and Amsacta moorei entomopoxvirus. 

80. An isolated nucleic acid molecule comprising: 

(a) one or more recombination sites; 

(b) one or more topoisomerase recognition sites and/or one or more 
topoisomerases; and 

(c) one or more nucleic acid sequences which encode an amino acid 
- sequence tag. 

81. The isolated nucleic acid molecule of claim 80, further comprising at least 
one additional nucleic acid sequence selected from the group consisting of 
a selectable marker, a cloning site, a restriction site, a promoter, an 
operator, an operon, a nucleotide sequence encoding a gene product which 
allows for negative selection, an origin of replication, a nucleotide 
sequence which encodes a repressor of at least one promoter, and a gene or 
partial gene. 

82. The isolated nucleic acid molecule of claim 80, wherein a nucleic acid 
sequence of interest can be inserted at or within 20 nucleotides of said one 
or more recombination sites, thereby producing a polynucleotide construct 
that encodes a fusion protein, said fusion protein comprising: (i) said 
amino acid sequence tag; and (ii) the amino acid sequence encoded by said 
nucleic acid sequence of interest. 
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83. The isolated nucleic acid molecule of claim 80, wherein a nucleic acid 
sequence of interest can be inserted at or within 20 nucleotides of said one 
or more topoisomerase recognition sites and/or at or within 20 nucleotides 
of the position of said one or more topoisomerases, thereby producing a 
polynucleotide construct that encodes a fusion protein, said fusion protein 
comprising: (i) said amino acid tag; and (ii) the amino acid sequence 
encoded by said nucleic acid sequence of interest. 

84. The isolated nucleic acid molecule of claim 80, further comprising a 
nucleic acid sequence that encodes an amino acid sequence that is capable 
of being cleaved by one or more proteases. 

85. The isolated nucleic acid molecule of claim 84, wherein said amino acid 
sequence that is capable of being cleaved by one or more proteases is an 
amino acid sequence that is capable of being cleaved by enterokinase. 

86. The isolated nucleic acid molecule of claim 84, wherein a nucleic acid 
sequence of interest can be inserted at or within 20 nucleotides of said one 
or more recombination sites, thereby producing a polynucleotide construct 
that encodes a fusion protein, said fusion protein comprising: (i) said 
amino acid sequence that is capable of being cleaved by one or more 
proteases, flanked on one side by (ii) said amino acid sequence tag, and on 
the other side by (iii) the amino acid sequence encoded by said nucleic 
acid sequence of interest. 

87. The isolated nucleic acid molecule of claim 84, wherein a nucleic acid 
sequence of interest can be inserted at or within 20 nucleotides of said one 
or more topoisomerase recognition sites and/or at or within 20 nucleotides 
of the position of said one or more topoisomerases, thereby producing a 
polynucleotide construct that encodes a fusion protein, said fusion protein 
comprising: (i) said amino acid sequence that is capable of being cleaved 
by one or more proteases, flanked on one side by (ii) said amino acid 
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sequence tag, and on the other side by (iii) the amino acid sequence 
encoded by said nucleic acid sequence of interest. 

88. The isolated nucleic acid molecule of claim 80, wherein said amino acid 
sequence tag is an amino acid sequence that is capable of being post- 
translationally modified. 

89. The isolated nucleic acid molecule of claim 88, wherein said amino acid 
sequence that is capable of being post-translationally modified is an amino 
acid sequence that is capable of being post-translationally modified by 
biotinylation, attachment of 4-phosphopanthetheine, attachment of lipoic 
acid or attachment of flavins. 

90. The isolated nucleic acid molecule of claim 80, wherein said amino acid 
sequence that is capable of being post-translationally modified is an amino 
acid sequence that is capable of being biotinylated. 

91. The isolated nucleic acid molecule of claim 90, wherein said amino acid 
sequence that is capable of being biotinylated is all or a portion of the 
Klebsiella pneumoniae oxalacetate decarboxylase a subunit, ail or a 
portion of the Propionibacterium shennanii transcarboxylase 1.3S subunit, 
or all or a portion of the Escherichia coli biotin carboxyl carrier protein 
component of acetyl-CoA carboxylase. 

92. The isolated nucleic acid molecule of claim 90, wherein said amino acid 
sequence that is capable of being biotinylated is a portion of the C- 
terminus of the Klebsiella pneumoniae oxalacetate decarboxylase a 
subunit. 



93. The isolated nucleic acid molecule of claim 92, wherein said amino acid 
sequence that is capable of being biotinylated is the BIOTAG™. 
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94. The isolated nucleic acid molecule of claim 80, wherein said nucleic acid 
molecule is a circular molecule. 

95. The isolated nucleic acid molecule of claim 80, wherein said nucleic acid 
molecule comprises two or more recombination sites. 

96. The isolated nucleic acid molecule of claim 80, wherein said 
recombination sites are selected from the group consisting of: (a) atiB 
sites, (b) at® sites, (c) attL sites, (d) attK sites, (e) lox sites, (f) psi sites, 
(g) dif sites, (h) cer sites, (i) fit sites, and mutants, variants, and derivatives 
of the recombination sites of (a), (b), (c), (d), (e), (f), (g), (h), or (i) which 
retain the ability to undergo recombination. 

97. The isolated nucleic acid molecule of claim 80, wherein said 
topoisomerase is a type I topoisomerase. 

« 

98. The isolated nucleic acid molecule of claim 97, wherein said type I 
topoisomerase is a type IB topoisomerase. 

99. The isolated nucleic acid molecule of claim 98, wherein said type IB 
topoisomerase is selected from the group consisting of eukaryotic nuclear 
type I topoisomerase. and a poxvirus topoisomerase. 

100. The isolated nucleic acid molecule of claim 99, wherein said poxvirus 
topoisomerase is produced by or isolated from a virus selected from the 
group consisting of vaccinia virus, Shope fibroma virus, ORF virus, 
fowlpox virus, molluscum contagjosum virus and Amsacta moorei 
entomopoxvirus. 

101. A vector comprising the isolated nucleic acid molecule of claim 80. 
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1 02. A host cell comprising the isolated nucleic acid molecule of claim 80. 

103. A host cell comprising the vector of claim 101. 

104. A method of producing a polynucleotide construct that, encodes a 
fusion protein that comprises an amino acid sequence tag, said method 
comprising: 

(a) obtaining a first nucleic acid molecule comprising a nucleotide 
sequence of interest; 

(b) obtaining a second nucleic acid molecule comprising (i) at least a first 
topoisomerase recognition site flanked by (ii) at least a first 
recombination site, and (iii) at least a second topoisomerase 
recognition site flanked by (iv) at least a second recombination site, 
wherein said first and second recombination sites do not recombine 
with each other, and (v) at least one topoisomerase; 

(c) obtaining a third nucleic acid molecule comprising: (i) at least a third 
and fourth recombination sites that do not recombine with each other; 
and (ii) one or more nucleic acid sequences which encode an amino 
acid sequence tag; 

(d) mixing said first nucleic acid molecule with said second nucleic acid 
molecule; 

(e) incubating said mixture under conditions such that said first nucleic 
acid molecule is inserted into said second nucleic acid molecule 
between said at least two topoisomerase recognition sites, thereby 
producing a first product polynucleotide construct; 

. (f) contacting said first product polynucleotide construct with said third 
nucleic acid molecule under conditions favoring recombination 
between said first and third and between said second and fourth 
recombination sites, thereby producing a second product 
polynucleotide construct; 
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wherein said second product polynucleotide construct encodes a fusion 
protein comprising: (i) said amino acid sequence tag; and (ii) the amino acid 
sequence encoded by said nucleotide sequence of interest. 

105. The method of claim 104, wherein said third nucleic acid molecule 
further comprises a nucleic acid sequence that encodes an amino acid 
sequence that is capable of being cleaved by one or more proteases; and 
wherein said second product polynucleotide construct encodes a fusion 

protein comprising: (i) said amino acid sequence that is capable of being 
cleaved by one or more proteases, flanked on one side by (ii) said amino acid 
sequence tag, and on the other side by (iii)the amino acid sequence encoded by 
said nucleotide sequence of interest. 

106. The method of claim 105, wherein said amino acid sequence that is 
capable of being cleaved by one or more proteases is an amino acid 
sequence that is capable of being cleaved by enterokinase. 

* 

107. The method of claim 104, wherein said amino acid sequence tag is an 
amino acid sequence that is capable of being post-translationally modified. 

108. The methodVof claim 107, wherein said amino acid sequence that is 
capable of being post-translationally modified is an amino acid sequence 
that is capable of being post-translationally modified by biotinylation, 
attachment of 4-phosphopanthetheine, attachment of lipoic acid or 
attachment of flavins. 

109. The method of claim 107, wherein said amino acid sequence that is 
capable of being post-translationally modified is an amino acid sequence 
that is capable of being biotinylated 

1 10. The method of claim of claim 109, wherein said amino acid sequence 
that is capable of being biotinylated is all or a portion of the Klebsiella 
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pnewnoniae oxalacetate decarboxylase a subunit, all or a portion of the 
Propionibacterium shermanii transcarboxylase 1.3S subunit, or all gr a 
portion of the Escherichia coli biotin carboxyl carrier protein component 
of acetyl-CoA carboxylase. 

111. The method of claim of claim 109, wherein said amino acid sequence 
that is capable of being biotinylated is a portion of the C-terminus of the 
Klebsiella pneumoniae oxalacetate decarboxylase a subunit. 

112. The method of claim 111, wherein said amino acid sequence that is 
capable of being biotinylated is the BIOTAG™. 

113. The method of claim 104, wherein said second nucleic acid molecule is 
a vector. 

114. The method of claim 104, wherein said third nucleic acid molecule is a 
vector. 

115. The method of claim 104, wherein said first nucleic acid molecule is a 
linear nucleic acid molecule. 

116. The method of claim 115, wherein said first nucleic acid molecule is a 
blunt-end nucleic acid molecule. 

117. The method of claim 104, wherein said first nucleic acid molecule is a 
PCR product. 

118. The method of claim 104, further comprising inserting said first 
product polynucleotide construct into a host cell. 
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119. The method of claim 104, further comprising inserting said second 
product polynucleotide construct into a host cell. 

120. The method of claim 104, wherein said second and/or said third 
nucleic acid molecules comprises at least one additional nucleic acid 
sequence selected from the group consisting of a selectable marker, a 
cloning site, a restriction site, a promoter, an operator, an operon, a 
nucleotide sequence encoding a gene product which allows for negative 
selection, an origin of replication, a nucleotide sequence which encodes a 
repressor of at least one promoter, and a gene or partial gene. 

121. The method of claim 104, wherein said first, second, third and fourth 
recombination sites are selected from the group consisting of: (a) attB 
sites, (b) aftP sites, (c) attL sites, (d) attR sites, (e) lox sites, (f) psi sites, 
(g) dif sites, (h) cer sites, (i) frt sites, and mutants, variants, and derivatives 
of the recombination sites of (a), (b), (c), (d), (e), (f), (g), (h), or (i) which 
retain the ability to undergo recombination. 

122. The method of claim 104, wherein said topoisomerase is a type I 
topoisomerase. 

123. The method of claim 122, wherein said type I topoisomerase is a type 
IB topoisomerase. 

124. The method of claim 123, wherein said type IB topoisomerase is 
selected from the group consisting of eukaryotic nuclear type I 
topoisomerase and a poxvirus topoisomerase. 

125. The method of claim 124, wherein said poxvirus topoisomerase is 
produced by or isolated from a virus selected from. the group consisting of 
vaccinia virus, Shope fibroma virus, ORF virus, fowlpox virus, moliuscum 
contagiosum virus and Amsacta moorei entomopoxvirus. 
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126. The method of claim 104, wherein said first product polynucleotide 
construct and said third nucleic acid molecule are combined in the 
presence of at least one recombination protein. 

127. The method of claim 126, wherein said recombination protein is 
selected from the group consisting of: (a) Cre, (b) hit, (c) IHF, (d) Xis, (e) 
Fis, (f) Hin, (g) Gin, (h) Cin, (i) Tn3 resolvase, (j) TndX, (k) XerC, and (1) 
XerD. 

128. The method of claim 126, wherein said recombination protein is Cre. 

129. A vector selected from the group consisting of pET104-DEST, 
pET104/GW//a C Z, pET104/D-TOPO, P ET104/D//acZ, pcDNA67Biotag™- 
DEST, pcDNA6/Biotag™-GW//acZ, pcDNA6/Biotag™/t>-TOPO, 
pcDNA67Biotag™//acZ, pMT/Biotag™-DEST, and pMT/Biotag™/GW- 
lacZ. 

130. A kit comprising the'isolated nucleic acid molecule of claim 1 . 

131. The kit of claim 130, further comprising one or more components 
selected from the group consisting of one or more topoisomerases, one or 
more recombination proteins, one or more vectors, one or more 
polypeptides having polymerase activity, one or more host cells, and one 
or more support matrices complexed with avidin or an avidin analog. 

132. A kit comprising the isolated nucleic acid molecule of claim 39. 

133. The kit of claim 132, further comprising one or more components 
selected from the group consisting of one or more topoisomerases, one or 

• more recombination proteins, one or more vectors, one or more 
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polypeptides having polymerase activity, one or more host cells, and one 
or more support matrices complexed with avidin or an avidin analog. 

134. A kit comprising the isolated nucleic acid molecule of claim 80. 

135. The kit of claim 134, further comprising one or more components 
selected from the group consisting of one or more topoisomerases, one or 
more recombination proteins, one or more vectors, one or more 
polypeptides having polymerase activity, one or more host cells, and one 
or more support matrices complexed with avidin or an avidin analog. 

136. A host cell comprising a polynucleotide construct that encodes a fusion 
protein capable of being post-translationally modified, said polynucleotide 
construct produced according to the method of claim 19. 

137. A host cell comprising a polynucleotide construct that encodes a fusion 
protein capable of being post-translationally modified, said polynucleotide 
construct produced according to the method of claim 60. 

138. A host cell comprising a polynucleotide construct that encodes a fusion 
protein capable of being post-translationally modified, said polynucleotide 
construct produced according to the method of claim 1 04. 

139. A method of producing a fusion protein that comprises an amino acid 
sequence tag, said method comprising: 

(a) obtaining the host cell of claim 136; and 

(b) culturing said host cell under conditions wherein said fusion protein is 
produced by said host cell. 

140. The method of claim 139, wherein said amino acid sequence tag is an 
amino acid sequence that is capable of being post-translationally modified. 
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141. The method of claim i^n 
'43. The method 0 f claim I39 , mnher comprising- 

i*: r whereh " d **■ ~* * ■ 

'45. Ametho «ofP™<'uem gafilsioilproteintllat 
«qu»ce tag, said method comprising: 
(a) obtaining the host cell of claim 137- and 

P uie or bem g Post-translationally modified. 
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148. The method of claim 146, further comprising culturing said host cell 
under conditions wherein said fusion protein is biotinylated in said host 
cell. 

149. The method of claim 145, further comprising: 

(a) treating said host cell such that said fusion protein is released from 
said host cell; and 

(b) contacting said fusion protein with a detecting composition comprising 
a molecule that is capable of interacting with said amino acid sequence 
tag or with a molecular entity that is attached to said amino acid 
sequence tag. 

150. The method of claim 149, wherein said fusion protein is a biotinylated 
fusion protein, and said detecting composition comprises avidin or an 
avidin analogue. 

151. A method of producing a fusion protein that comprises an amino acid 
sequence tag, said method comprising: 

(a) obtaining the host cell of claim 138; and 

(b) culturing said host cell under conditions wherein said fusion protein is 
produced by said host cell. 

152. The method of claim 151, wherein said amino acid sequence tag is an 
amino acid sequence that is capable of being post-translationally modified. 

153. The method of claim 152, further comprising culturing said host cell 
under conditions wherein said fusion protein is post-translationally 
modified in said host cell. 
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154. The method of claim 152, further comprising culturing said host cell 
under conditions wherein said fusion protein is biotinylated in said host 
cell. 

155. The method of claim 151, further comprising: 

(a) treating said host cell such that said fusion protein is released from 
said host cell; and 

(b) contacting said fusion protein with a detecting composition 
comprising a molecule that is capable of interacting with said amino 
acid sequence tag or with a molecular entity that is attached to said 
amino acid sequence tag. 

156. The method of claim 155, wherein said post-translationally modified 
fusion protein is a biotinylated fusion protein, and said detecting 
composition comprises avidin or an avidin analogue. 
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CAAGGAGATGGCGCCCAACAGTCCCCCGGCCACGGGGCCTGCCACCATACCCACGCCGAAACAAGCGCTC 

ATGAGCCCGAAGTGGCGAGCCCGATCTTCCCCATCGGTGATGTCGGCGATATAGGCGCCAGCAACCGCAC 

CTGTGGCGCCGGTGATGCCGGCCACGATGCGTCCGGCGTAGAGGATCGAGATCTCGATCCCGCGAAATTA 

ATACGACTCACTATAGGGGAATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTA 

AGAAGGAGATATACATATGGGCGCCGGCACCCCGGTGACCGCCCCGCTGGCGGGCACTATCTGGAAGGTG 

CTGGCCAGCGAAGGCCAGACGGTGGCCGCAGGCGAGGTGCTGCTGATTCTGGAAGCCATGAAGATGGAAA 

CCGAAATCCGCGCCGCGCAGGCCGGGACCGTGCGCGGTATCGCGGTGAAAGCCGGCGACGCGGTGGCGGT 

CGGCGACACCCTGATGACCCTGGCGGGCTCTGGATCCGATCTGTACGACGATGACGATAAGGGAATTATC 

ACAAGTTTGTACAAAAAAGCTGAACGAGAAACGTAAAATGATATAAATATCAATATATTAAATTAGATTT 

TGCATAAAAAACAGACTACATAATACTGTAAAACACAACATATCCAGTCACTATGGCGGCCGCATTAGGC 

ACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGATTTTGAGTTAGGATCCGGCGAGAT 

TTTCAGGAGCTAAGGAAGCTAAAATGGAGAAAAAAATCACTGGATATACCACCGTTGATATATCCCAATG 

GCATCGTAAAGAACATTTTGAGGCATTTCAGTCAGTTGCTCAATGTACCTATAACCAGACCGTTCAGCTG 

GATATTACGGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACATTC 

TTGCCCGCCTGATGAATGCTCATCCGGAATTCCGTATGGCAATGAAAGACGGTGAGCTGGTGATATGGGA 

TAGTGTTCACCCTTGTTACACCGTTTTCCATGAGCAAACTGAAACGTTTTCATCGCTCTGGAGTGAATAC 

CACGACGATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAACCTGGCCT 

ATTTCCCTAAAGGGTTTATTGAGAATATGTTTTTCGTCTCAGCCAATCCCTGGGTGAGTTTCACCAGTTT 

TGATTTAAACGTGGCCAATATGGACAACTTCTTCGCCCCCGTTTTCACCATGGGCAAATATTATACGCAA 

GGCGACAAGGTGCTGATGCCGCTGGCGATTCAGGTTCATCATGCCGTCTGTGATGGCTTCCATGTCGGCA 

GAATGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAAACGCGTGGATCCGGCTT 

ACTAAAAGCCAGATAACAGTATGCGTATTTGCGCGCACCGGTGCTAGCGTATACCCGAAGTATGTCAAAA 

AGAGGTGTGCTATGAAGCAGCGTATTACAGTGACAGTTGACAGCGACAGCTATCAGTTGCTCAAGGCATA 

TATGATGTCAATATCTCCGGTCTGGTAAGCACAACCATGCAGAATGAAGCCCGTCGTCTGCGTGCCGAAC 

GCTGGAAAGCGGAAAATCAGGAAGGGATGGCTGAGGTCGCCCGGTTTATTGAAATGAACGGCTCTTTTGC 

TGACGAGAACAGGGACTGGTGAAATGCAGTTTAAGGTTTACACCTATAAAAGAGAGAGCCGTTATCGTCT 

GTTTGTGGATGTACAGAGTGATATTATTGACACGCCCGGGCGACGGATGGTGATCCCCCTGGCCAGTGCA 

CGTCTGCTGTCAGATAAAGTCTCCCGTGAACTTTACCCGGTGGTGCATATCGGGGATGAAAGCTGGCGCA 

TGATGACCACCGATATGGCCAGTGTGCCGGTCTCCGTTATCGGGGAAGAAGTGGCTGATCTCAGCCACCG 

CGAAAATGACATCAAAAACGCCATTAACCTGATGTTCTGGGGAATATAAATGTCAGGCTCCGTTATACAC 

AGCCAGTCTGCAGGTCGACCATAGTGACTGGATATGTTGTGTTTTACAGTATTATGTAGTCTGTTTTTTA 

TGCAAAATCTAATTTAATATATTGATATTTATATCATTTTACGTTTCTCGTTCAGCTTTCTTGTACAAAG 

TGGTGATAATTAATTAAGATAGCTCAGATCCGGCTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGC 

TGCCACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCTG 

AAAGGAGGAACTATATCCGGATATCCCGCAAGAGGCCCGGCAGTACCGGCATAACCAAGCCTATGCCTAC 

AGCATCCAGGGTGACGGTGCCGAGGATGACGATGAGCGCATTGTTAGATTTCATACACGGTGCCTGACTG 

CGTTAGCAATTTAACTGTGATAAACTACCGCATTAAAGCTAGCTTATCGATGATAAGCTGTCAAACATGA 

GAATTAATTCTTGAAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAAT 

GGTTTCTTAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAA 

ATACATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGA 

AGAGTATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTT 

TGCTCACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATC 

GAACTGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCA 

CTTTTAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCGCCG 

CATACACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATG 

ACAGTAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAA 

CGATCGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCG 

TTGGGAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGCAGCAATGGCA 

ACAACGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGA 

TGGAGGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGG(XCTTCCGGCTGGCTGGTTTATTGCTGATAA 
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ATCTGGAGCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGT 

ATCGTAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAG 

GTGCCTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTTAGATTGATTTAAA 

ACTTCATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAA 

CGTGAGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTT 

TTCTGCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCA 

AGAGCTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTTCTA 

GTGTAGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCC 

TGTTACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACC 

GGATAAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTAC 

ACCGAACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACA 

GGTATCCGGTAAGCGGCAGGGTCGGAACAGGAGAGCGCACGAGGGAGCTTCCAGGGGGAAACGCCTGGTA 

TCTTTATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGG 

CGGAGCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTC 

ACATGTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATAC 

CGCTCGCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCTGATGCGG 

TATTTTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATATGGTGCACTCTCAGTACAATCTGCTC 

TGATGCCGCATAGTTAAGCCAGTATACACTCCGCTATCGCTACGTGACTGGGTCATGGCTGCGCCCCGAC 

ACCCGCCAACACCCGCTGACGCGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGT 

GACCGTCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCGCGAGGCAGCTGCGG 

TAAAGCTCATCAGCGTGGTCGTGAAGCGATTCACAGATGTCTGCCTGTTCATCCGCGTCCAGCTCGTTGA 

GTTTCTCCAGAAGCGTTAATGTCTGGCTTCTGATAAAGCGGGCCATGTTAAGGGCGGTTTTTTCCTGTTT 

GGTCACTGATGCCTCCGTGTAAGGGGGATTTCTGTTCATGGGGGTAATGATACCGATGAAACGAGAGAGG 

ATGCTCACGATACGGGTTACTGATGATGAACATGCCCGGTTACTGGAACGTTGTGAGGGTAAACAACTGG 

CGGTATGGATGCGGCGGGACCAGAGAAAAATCACTCAGGGTCAATGCCAGCGCTTCGTTAATACAGATGT 

AGGTGTTCCACAGGGTAGCCAGCAGCATCCTGCGATGCAGATCCGGAACATAATGGTGCAGGGCGCTGAC 

TTCCGCGTTTCCAGACTTTACGAAACACGGAAACCGAAGACCATTCATGTTGTTGCTCAGGTCGCAGACG 

TTTTGCAGCAGCAGTCGCTTCACGTTCGCTCGCGTATCGGTGATTCATTCTGCTAACCAGTAAGGCAACC 

CCGCCAGCCTAGCCGGGTCCTCAACGACAGGAGCACGATCATGCGCACCCGTGGCCAGGACCCAACGCTG 

CCCGAGATGCGCCGCGTGCGGCTGCTGGAGATGGCGGACGCGATGGATATGTTCTGCCAAGGGTTGGTTT 

GCGCATTCACAGTTCTCCGCAAGAATTGATTGGCTCCAATTCTTGGAGTGGTGAATCCGTTAGCGAGGTG 

CCGCCGGCTTCCATTCAGGTCGAGGTGGCCCGGCTCCATGCACCGCGACGCAACGCGGGGAGGCAGACAA 

GGTATAGGGCGGCGCCTACAATCCATGCCAACCCGTTCCATGTGCTCGCCGAGGCGGCATAAATCGCCGT 

GACGATCAGCGGTCCAGTGATCGAAGTTAGGCTGGTAAGAGCCGCGAGCGATCCTTGAAGCTGTCCCTGA 

TGGTCGTCATCTACCTGCCTGGACAGCATGGCCTGCAACGCGGGCATCCCGATGCCGCCGGAAGCGAGAA 

GAATCATAATGGGGAAGGCCATCCAGCCTCGCGTCGCGAACGCCAGCAAGACGTAGCCCAGCGCGTCGGC 

CGCCATGCCGGCGATAATGGCCTGCTTCTCGCCGAAACGTTTGGTGGCGGGACCAGTGACGAAGGCTTGA 

GCGAGGGCGTGCAAGATTCCGAATACCGCAAGCGACAGGCCGATCATCGTCGCGCTCCAGCGAAAGCGGT 

CCTCGCCGAAAATGACCCAGAGCGCTGCCGGCACCTGTCCTACGAGTTGCATGATAAAGAAGACAGTCAT 

AAGTGCGGCGACGATAGTCATGCCCCGCGCCCACCGGAAGGAGCTGACTGGGTTGAAGGCTCTCAAGGGC 

ATCGGTCGAGATCCCGGTGCCTAATGAGTGAGCTAACTTACATTAATTGCGTTGCGCTCACTGCCCGCTT 

TCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCG 

TATTGGGCGCCAGGGTGGTTTTTCTTTTCACCAGTGAGACGGGCAACAGCTGATTGCCCTTCACCGCCTG 

GCCCTGAGAGAGTTGCAGCAAGCGGTCCACGCTGGTTTGCCCCAGCAGGCGAAAATCCTGTTTGATGGTG 

GTTAACGGCGGGATATAACATGAGCTGTCTTCGGTATCGTCGTATCCCACTACCGAGATATCCGCACCAA 

CGCGCAGCCCGGACTCGGTAATGGCGCGCATTGCGCCCAGCGCCATCTGATCGTTGGCAACCAGCATCGC 

AGTGGGAACGATGCCCTCATTCAGCATTTGCATGGTTTGTTGAAAACCGGACATGGCACTCCAGTCGCCT 

TCCCGTTCCGCTATCGGCTGAATTTGATTGCGAGTGAGATATTTATGCCAGCCAGCCAGACGCAGACGCG 

CCGAGACAGAACTTAATGjSGCCCGCTAACAGCGCGATTTGCTGGTGACCCAATGCGACCAGATGCTCCAC 

GCCCAGTCGCGTACCGTCTTCATGGGAGAAAATAATACTGTTGATGGGTGTCTGGTCAGAGACATCAAGA 




WO 2004/005482 



4/31 



PCTAJS2003/021339 



AATAACGCCGGRACATTAGTGCAGGCAGCTTCCACAGCAATGGCATCCTGGTCATCCAGCGGATAGTTAA 
TGATCAGCCCACTGACGCGTTGCGCGAGAAGATTGTGCACCGCCGCTTTACAGGCTTCGACGCCGCTTCG 
TTCTACCATCGACACCACCACGCTGGCACCCAGTTGATCGGCGCGAGATTTAATCGCCGCGACAATTTGC 
GACGGCGCGTGCAGGGCCAGACTGGAGGTGGCAACGCCAATCAGCAACGACTGTTTGCCCGCCAGTTGTT 
GTGCCACGCGGTTGGGAATGTAATTCAGCTCCGCCATCGCCGCTTCCACTTTTTCCCGCGTTTTCGCAGA 
AACGTGGCTGGCCTGGTTCACCACGCGGGAAACGGTCTGATAAGAGACACCGGCATACTCTGCGACATCG 
TATAACGTTACTGGTTTCACATTCACCACCCTGAATTGACTCTCTTCCGGGCGCTATCATGCCATACCGC 
GAAAGGTTTTGCGCCATTCGATGGTGTCCGGGATCTCGACGCTCTCCCTTATGCGACTCCTGCATTAGGA 
AGCAGCCCAGTAGTAGGTTGAGGCCGTTGAGCACCGCCGCCGCAAGGAATGGTGCATG 
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CAAGGAGATGGCGCCCAACAGTCCCCCGGCCACGGGGCCTGCCACCATACCCACGCCGAAACAAGCGCTC 

ATGAGCCCGAAGTGGCGAGCCCGATCTTCCCCATCGGTGATGTCGGCGATATAGGCGCCAGCAACCGCAC 

CTGTGGCGCCGGTGATGCCGGCCACGATGCGTCCGGCGTAGAGGATCGAGATCTCGATCCCGCGAAATTA 

ATACGACTCACTATAGGGGAATTGTGAGCGGATAACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTA 

AGAAGGAGATATACATATGGGCGCCGGCACCCCGGTGACCGCCCCGCTGGCGGGCACTATCTGGAAGGTG 

CTGGCCAGCGAAGGCCAGACGGTGGCCGCAGGCGAGGTGCTGCTGATTCTGGAAGCCATGAAGATGGAAA 

CCGAAATCCGCGCCGCGCAGGCCGGGACCGTGCGCGGTATCGCGGTGAAAGCCGGCGACGCGGTGGCGGf 

CGGCGACACCCTGATGACCCTGGCGGGCTCTGGATCCGATCTGTACGACGATGACGATAAGGGAATTGAT 

CCCTTCACCAAGGGCGAGCTCAGATCCGGGTGCTAACAAAGCCCGAAAGGAAGCTGAGTTGGCTGCTGCC 

ACCGCTGAGCAATAACTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGGTTTTTTGCTGAAAG 

GAGGAACTATATCCGGATATCCCGCAAGAGGCCCGGCAGTACCGGCATAACCAAGCCTATGCCTACAGCA 

TCCAGGGTGACGGTGCCGAGGATGACGATGAGCGCATTGTTAGATTTCATACACGGTGCCTGACTGCGTT 

AGCAATTTAACTGTGATAAACTACCGCATTAAAGCTAGCTTATCGATGATAAGCTGTCAAACATGAGAAT 

TAATTCTTGAAGACGAAAGGGCCTCGTGATACGCCTATTTTTATAGGTTAATGTCATGATAATAATGGTT 

TCTTAGACGTCAGGTGGCACTTTTCGGGGAAATGTGCGCGGAACCCCTATTTGTTTATTTTTCTAAATAC 

ATTCAAATATGTATCCGCTCATGAGACAATAACCCTGATAAATGCTTCAATAATATTGAAAAAGGAAGAG 

TATGAGTATTCAACATTTCCGTGTCGCCCTTATTCCCTTTTTTGCGGCATTTTGCCTTCCTGTTTTTGCT 

CACCCAGAAACGCTGGTGAAAGTAAAAGATGCTGAAGATCAGTTGGGTGCACGAGTGGGTTACATCGAAC 

TGGATCTCAACAGCGGTAAGATCCTTGAGAGTTTTCGCCCCGAAGAACGTTTTCCAATGATGAGCACTTT 

TAAAGTTCTGCTATGTGGCGCGGTATTATCCCGTGTTGACGCCGGGCAAGAGCAACTCGGTCGCCGCATA 

CACTATTCTCAGAATGACTTGGTTGAGTACTCACCAGTCACAGAAAAGCATCTTACGGATGGCATGACAG 

TAAGAGAATTATGCAGTGCTGCCATAACCATGAGTGATAACACTGCGGCCAACTTACTTCTGACAACGAT 

CGGAGGACCGAAGGAGCTAACCGCTTTTTTGCACAACATGGGGGATCATGTAACTCGCCTTGATCGTTGG 

GAACCGGAGCTGAATGAAGCCATACCAAACGACGAGCGTGACACCACGATGCCTGCAGCAATGGCAACAA 

CGTTGCGCAAACTATTAACTGGCGAACTACTTACTCTAGCTTCCCGGCAACAATTAATAGACTGGATGGA 

GGCGGATAAAGTTGCAGGACCACTTCTGCGCTCGGCCCTTCCGGCTGGCTGGTTTATTGCTGATAAATCT 

GGAGCCGGTGAGCGTGGGTCTCGCGGTATCATTGCAGCACTGGGGCCAGATGGTAAGCCCTCCCGTATCG 

TAGTTATCTACACGACGGGGAGTCAGGCAACTATGGATGAACGAAATAGACAGATCGCTGAGATAGGTGC 

CTCACTGATTAAGCATTGGTAACTGTCAGACCAAGTTTACTCATATATACTTTAGATTGATTTAAAACTT 

CATTTTTAATTTAAAAGGATCTAGGTGAAGATCCTTTTTGATAATCTCATGACCAAAATCCCTTAACGTG 

AGTTTTCGTTCCACTGAGCGTCAGACCCCGTAGAAAAGATCAAAGGATCTTCTTGAGATCCTTTTTTTCT 

GCGCGTAATCTGCTGCTTGCAAACAAAAAAACCACCGCTACCAGCGGTGGTTTGTTTGCCGGATCAAGAG 

CTACCAACTCTTTTTCCGAAGGTAACTGGCTTCAGCAGAGCGCAGATACCAAATACTGTCCTTCTAGTGT 

AGCCGTAGTTAGGCCACCACTTCAAGAACTCTGTAGCACCGCCTACATACCTCGCTCTGCTAATCCTGTT 

ACCAGTGGCTGCTGCCAGTGGCGATAAGTCGTGTCTTACCGGGTTGGACTCAAGACGATAGTTACCGGAT 

AAGGCGCAGCGGTCGGGCTGAACGGGGGGTTCGTGCACACAGCCCAGCTTGGAGCGAACGACCTACACCG 

AACTGAGATACCTACAGCGTGAGCTATGAGAAAGCGCCACGCTTCCCGAAGGGAGAAAGGCGGACAGGTA 

TCCGGT AAGCGGCAGGG TCGG AACAGGAGAGCGCACG AGGGAGCT TCCAGGGGGAAACGCCTGG T ATCTT 

TATAGTCCTGTCGGGTTTCGCCACCTCTGACTTGAGCGTCGATTTTTGTGATGCTCGTCAGGGGGGCGGA 
GCCTATGGAAAAACGCCAGCAACGCGGCCTTTTTACGGTTCCTGGCCTTTTGCTGGCCTTTTGCTCACAT 
GTTCTTTCCTGCGTTATCCCCTGATTCTGTGGATAACCGTATTACCGCCTTTGAGTGAGCTGATACCGCT 
CGCCGCAGCCGAACGACCGAGCGCAGCGAGTCAGTGAGCGAGGAAGCGGAAGAGCGCCTGATGCGGTATT 
TTCTCCTTACGCATCTGTGCGGTATTTCACACCGCATATATGGTGCACTCTCAGTACAATCTGCTCTGAT 
GCCGCATAGTTAAGCCAGTATACACTCCGCTATCGCTACGTGACTGGGTCATGGCTGCGCCCCGACACCC 
GCCAACACCCGCTGACGGGCCCTGACGGGCTTGTCTGCTCCCGGCATCCGCTTACAGACAAGCTGTGACC 
GTCTCCGGGAGCTGCATGTGTCAGAGGTTTTCACCGTCATCACCGAAACGCGCGAGGCAGCTGCGGTAAA 
GCTCATCAGCGTGGTCGTGAAGCGATTCACAGATGTCTGCCTGTTCATCCGCGTCCAGCTCGTTGAGTTT 
CTCCAGAAGCGTTAATGTCTGGCTTCTGATAAAGCGGGCCATGTTAAGGGCGGTTTTTTCCTGTTTGGTC 
ACTGATGCCTCCGTGTAAGGGGGATTTCTGTTCATGGGGGTAATGATACCGATGAAACGAGAGAGGATGC 
TC^CGATACGGGTTACTGATGATGAACATGCCCGGTTACTGGAACGTTGTGAGGGTAAACAACTGGCGGT 
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ATGGATGCGGCGGGACCAGAGAAAAATCACTCAGGGTCAATGCCAGCGCTTCGTTAATACAGATGTAGGT 

GTTCCACAGGGTAGCCAGCAGCATCCTGCGATGCAGATCCGGAACATAATGGTGCAGGGCGCTGACTTCC 

GCGTTTCCAGACTTTACGAAACACGGAAACCGAAGACCATTCATGTTGTTGCTCAGGTCGCAGACGTTTT 

GCAGCAGCAGTCGCTTCACGTTCGCTCGCGTATCGGTGATTCATTCTGCTAACCAGTAAGGCAACCCCGC 

CAGCCTAGCCGGGTCCTCAACGACAGGAGCACGATCATGCGCACCCGTGGCCAGGACCCAACGCTGCCCG 

AGATGCGCCGCGTGCGGCTGCTGGAGATGGCGGACGCGATGGATATGTTCTGCCAAGGGTTGGTTTGCGC 

ATTCACAGTTCTCCGCAAGAATTGATTGGCTCCAATTCTTGGAGTGGTGAATCCGTTAGCGAGGTGCCGC 

CGGCTTCCATTCAGGTCGAGGTGGCCCGGCTCCATGCACCGCGACGCAACGCGGGGAGGCAGACAAGGTA 

TAGGGCGGCGCCTACAATCCATGCCAACCCGTTCCATGTGCTCGCCGAGGCGGCATAAATCGCCGTGACG 

ATCAGCGGTCCAGTGATCGAAGTTAGGCTGGTAAGAGCCGCGAGCGATCCTTGAAGCTGTCCCTGATGGT 

CGTCATCTACCTGCCTGGACAGCATGGCCTGCAACGCGGGCATCCCGATGCCGCCGGAAGCGAGAAGAAT 

CATAATGGGGAAGGCCATCCAGCCTCGCGTCGCGAACGCCAGCAAGACGTAGCCCAGCGCGTCGGCCGCC 

ATGCCGGCGATAATGGCCTGCTTCTCGCCGAAACGTTTGGTGGCGGGACCAGTGACGAAGGCTTGAGCGA 

GGGCGTGCAAGATrCCGAATACCGCAAGCGACAGGCCGATCATCGTCGCGCTCCAGCGAAAGCGGTCCTC 

GCCGAAAATGACCCAGAGCGCTGCCGGCACCTGTCCTACGAGTTGCATGATAAAGAAGACAGTCATAAGT 

GCGGCGACGATAGTCATGCCCCGCGCCCACCGGAAGGAGCTGACTGGGTTGAAGGCTCTCAAGGGCATCG 

GTCGAGATCCC6GTGCCTAATGAGTGAGCTAACTTACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCA 

GTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATT 

GGGCGCCAGGGTGGTTTTTCTTTTCACCAGTGAGACGGGCAACAGCTGATTGCCCTTCACCGCCTGGCCC 

TGAGAGAGTTGCAGCAAGCGGTCCACGCTGGTTTGCCCCAGCAGGCGAAAATCCTGTTTGATGGTGGTTA 

ACGGCGGGATATAACATGAGCTGTCTTCGGTATCGTCGTATCCCACTACCGAGATATCCGCACCAACGCG 

CAGCCCGGACTCGGTAATGGCGCGCATTGCGCCCAGCGCCATCTGATCGTTGGCAACCAGCATCGCAGTG 

GGAACGATGCCCTCATTCAGCATTTGCATGGTTTGTTGAAAACCGGACATGGCACTCCAGTCGCCTTCCC 

GTTCCGCTATCGGCTGAATTTGATTGCGAGTGAGATATTTATGCCAGCCAGCCAGACGCAGACGCGCCGA 

GACAGAACTTAATGGGCCCGCTAACAGCGCGATTTGCTGGTGACCCAATGCGACCAGATGCTCCACGCCC 

AGTCGCGTACCGTCTTCATGGGAGAAAATAATACTGTTGATGGGTGTCTGGTCAGAGACATCAAGAAATA 

ACGCCGGAACATTAGTGCAGGCAGCTTCCACAGCAATGGCATCCTGGTCATCCAGCGGATAGTTAATGAT 

CAGCCCACTGACGCGTTGCGCGAGAAGATTGTGCACCGCCGCTTTACAGGCTTCGACGCCGCTTCGTTCT 

ACCATCGACACCACCACGCTGGCAGCCAGTTGATCGGCGCGAGATTTAATCGCCGCGACAATTTGCGACG 

GCGCGTGCAGGGCCAGACTGGAGGTGGCAACGCCAATCAGCAACGACTGTTTGCCCGCCAGTTGTTGTGC 

CACGCGGTTGGGAATGTAATTCAGCTCCGCCATCGCCGCTTCCACTTTTTCCCGCGTTTTCGCAGAAACG 

TGGCTGGCCTGGTTCACCACGCGGGAAACGGTCTGA^AAGAGACACCGGCATACTCTGCGACATCGTATA 

ACGTTACTGGTTTCACATTCACCACCCTGAATTGACTCTCTTCCGGGCGCTATCATGCCATACCGCGAAA 

GGTTTTGCGCCATTCGATGGTGTCCGGGATCTCGACGCTCTCCCTTATGCGACTCCTGCATTAGGAAGCA 

GCCCAGTAGTAGGTTGAGGCCGTTGAGCACCGCCGCCGCAAGGAATGGTGCATG 
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GACGGATCGGGAGATCTCCCGATCCCCTATGGTCGACTCTCAGTACAATCTGCTCTGATGCCGCATAGTT 

AAGCCAGTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATT^ 

AC^GGCAAGGCTTGACCGACAATTGCATGAAGAATCTGCTTAGGG^ 

ATGTACGGGCCAGATATACGCGTTGACATTGATTATTGACTAGTTATTAATAGTAATCAATTACGG^ 

CCCMCGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCC^ 
ATTGACGTCAATGGGTGGACTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCC 

aagtacgccccctattgacgtcaatgacggtaaatggcccgcctggcattatgcccagtacatgacctS 

TGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGC 

AGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAA 

TGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACG 

CAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTCTCTGGCTAACTAGAGAACCCA 

CTGCTTACTGGCTTATCGAAATTAATACGACTCACTATAGGGAGACCCAAGCTGGCTAGCGTTTAAACTT 

AAGCTTACCATGGGCGCCGGCACCCCGGTGACCGCCCCGCTGGCGGGCACTATCTGGAAGGTGCTGGCCA 

GCGAAGGCCAGACGGTGGCCGCAGGCGAGGTGCTGCTGATTCTGGAAGCCATGAAGATGGAAACCGAAAT 

CCGCGCCGCGCAGGCCGGGACCGTGCGCGGTATCGCGGTGAAAGCCGGCGACGCGGTGGCGGTCGGCGAC 

ACCCTGATGACCCTGGCGGGCTQTGGATCCGATCTGTACGACGATGACGATAAGGTACATCAAACAAGTT 

TGTACAAAAAAGCTGAACGAGAAACGTAAAATGATATAAATATCAATATATTAAATTAGATTTTGCATAA 

AAAACAGACTACATAATACTGTAAAACACAACATATCCAGTCACTATGGCGGCCGCATTAGGCACCCCAG 

GCTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGATTTTGAGTTAGGATCCGGCGAGATTTTCAGG 

AGCTAAGGAAGCTAAAATGGAGAAAAAAATCACTGGATATACCACCGTTGATATATCCCAATGGCATCGT 

AAAGAACATTTTGAGGCATTTCAGTCAGTTGCTCAATGTACCTATAACCAGACCGTTCAGCTGGATATTA 

CGGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATTCACATTCTTGCCCG 

CCTGATGAATGCTCATCCGGAATTCCGTATGGCAATGAAAGACGGTGAGCTGGTGATATGGGATAGTGTT 

CACCCTTGTTACACCGTTTTCCATGAGCAAACTGAAACGTTTTCATCGCTCTGGAGTGAATACCACGACG 

ATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAACCTGGCCTATTTCCC 

TAAAGGGTTTATTGAGAATATGTTTTTCGTCTCAGCCAATCCCTGGGTGAGTTTCACCAGTTTTGATTTA 

AACGTGGCCAATATGGACAACTTCTTCGCCCCCGTTTTCACCATGGGCAAATATTATACGCAAGGCGACA 

AGGTGCTGATGCCGCTGGCGATTCAGGTTCATCATGCCGTCTGTGATGGCTTCCATGTCGGCAGAATGCT 

TAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAAACGCGTGGATCCGGCTTACTAAAA 

GCCAGATAACAGTATGCGTATTTGCGCGCTCGCGAACCGGTGTATACCCGAAGTATGTCAAAAAGAGGTG 

TGCTATGAAGCAGCGTATTACAGTGACAGTTGACAGCGACAGCTATCAGTTGCTCAAGGCATATATGATG 

TCAATATCTCCGGTCTGGTAAGCACAACCATGCAGAATGAAGCCCGTCGTCTGCGTGCCGAACGCTGGAA 

AGCGGAAAATCAGGAAGGGATGGCTGAGGTCGCCCGGTTTATTGAAATGAACGGCTCTTTTGCTGACGAG 

AACAGGGACTGGTGAAATGCAGTTTAAGGTTTACACCTATAAAAGAGAGAGCCGTTATCGTCTGTTTGTG 

GATGTACAGAGTGATATTATTGACACGCCCGGGCGACGGATGGTGATCCCCCTGGCCAGTGCACGTCTGC 

TGTCAGATAAAGTCTCCCGTGAACTTTACCCGGTGGTGCATATCGGGGATGAAAGCTGGCGCATGATGAC 

CACCGATATGGCCAGTGTGCCGGTCTCCGTTATCGGGGAAGAAGTGGCTGATCTCAGCCACCGCGAAAAT 

GACATCAAAAACGCCATTAACCTGATGTTCTGGGGAATATAAATGTCAGGCTCCGTTATACACAGCCAGT 

CTGCAGGTCGACCATAGTGAGTGGATATGTTGTGTTTTACAGTATTATGTAGTCTGTTTTTTATGCAAAA 

TCTAATTTAATATATTGATATTTATATCATTTTACGTTTCTCGTTCAGCTTTCTTGTACAAAGTGGTGAT 

AATTAATTAAGATCTAGAGGGCCCGTTTAAACCCGCTGATCAGCCTCGACTGTGCCTTCTAGTTGCCAGC 

CATCTGTTGTTTGCCCCTCCCCCGTGCCTTCCTTGACCCTGGAAGGTGCCACTCCCACTGTCCTTTCCTA 

ATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCATTCTATTCTGGGGGGTGGGGTGGGGCAG 

GACAGCAAGGGGGAGGATTGGGAAGACAATAGCAGGCATGCTGGGGATGCGGTGGGCTCTATGGCTTCTG 

AGGCGGAAAGAACCAGCTGGGGCTCTAGGGGGTATCCCCACGCGCCCTGTAGCGGCGCATTAAGCGCGGC 

TCCGATTTAGTGCTTrACGGCACCTCGACCCCAAAAAACTTGATTAGGGTGATGGTTCACGTAGTGGGCC 

atcx*cctgata<*cggttt^ 
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CAAACTGGAACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTTATAAGGGATTTTGGGGATTTCGG 
CCTATTGGTTAAAAAATGAGCTGATTTAACAAA^ 

TTAGGGTGTGGAAAGTCCCCAGGCTCCCCAGGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGTCA 

GCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTATGCAAAGCATGCATCTCAATTAGT 

CAGCAACCATAGTCCCGCCCCTAACTCCGCCCATCCCGCCCCTAACTCCGCCCAGTTCCGCCCATTCTCC 

GCCCCATGGCTGACTAATTTTTTTTATTTATGCAGAGGCCGAGGCCGCCTCTGCCTCTGAGCTATTCCAG 

AAGTAGTGAGGAGGCTTTTTTGGAGGCCTAGGCTTTTGCAAAAAGCTCCCGGGAGCTTGTATATCCATTT 

TCGGATCTGATCAGCACGTGTTGACAATTAATCATCGGCATAGTATATCGGCATAGTATAATACGACAAG 

GTGAGGAACTAAACCATGGCCAAGCCTTTGTCTCAAGAAGAATCCACCCTCATTGAAAGAGCAACGGCTA 

CAATCAACAGCATCCCCATCTCTGAAGACTACAGCGTCGCCAGCGCAGCTCTCTCTAGCGACGGCCGCAT 

CTTCACTGGTGTCAATGTATATCATTTTACTGGGGGACCTTGTGCAGAACTCGTGGTGCTGGGCACTGCT 

GCTGCTGCGGCAGCTGGCAACCTGACTTGTATCGTCGCGATCGGAAATGAGAACAGGGGCATCTTGAGCC 

CCTGCGGACGGTGCCGACAGGTGCTTCTCGATCTGCATCCTGGGATCAAAGCCATAGTGAAGGACAGTGA 

TGGACAGCCGACGGCAGTTGGGATTCGTGAATTGCTGCCCTCTGGTTATGTGTGGGAGGGCTAAGCACTT 

CGTGGCCGAGGAGCAGGACTGACACGTGCTACGAGATTTCGATTCCACCGCCGCCTTCTATGAAAGGTTG 

GGCTTCGGAATCGTTTTCCGGGACGCCGGCTGGATGATCCTCCAGCGCGGGGATCTCATGCTGGAGTTCT 

TCGCCCACCCCAACTTGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCAC 

AAATAAAGCATTtTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTC 

TGTATACCGTCGACCTCTAGCTAGAGCTTGGCGTAATCATGGTCATAGCTGTTTCCTGTGTGAAATTGTT 

ATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGTAAAGCCTGGGGTGCCTAATGAGT 

GAGCTAACTCACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTG 

CATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGGGCGCTCTTCCGCTTCCTCGCTCA 

CTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCTCACTCAAAGGCGGTAATACGGTT 

ATCCACAGAATCAGGGGATAACGCAGGAAAGAAGATGTGAGCAAAAGGCCAGCAAAAGGCCAGGAACCGT 

AAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCT 

CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGT 

GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCG 

CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGC 

ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAG 

ACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGAGGTATGTAGGCGGTGCT 

ACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTGGTATCTGCGCTCTGC 

TGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCT.CTTGATCCGGCAAACAAACCACCGCTGGTAGCGG 

TGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATCCTTTGATCTTT 

TCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGATTATCAAAAA 

GGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATGAGTAAAC 

TTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTTCATCC 

ATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTGCTG 

CAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGGC 

CGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGA 

GTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCT 

CGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTT 

GTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCA 

CTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTG 

GTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCGAGTTGCTCTTGCCCGGCGTCMT 

ACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCATTGGAAAACGTTCTTCGGGGCGA 

AAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACCCACTCGTGCACCCAACTGATCTT 

^!^2 TTTTACTTTCAC ^ 

AATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTCAATATTATTGAAGCATTTATCAG 
CATTTCCCCGAAAAGTGCCACCTGACGTC 
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GACGGATCGGGAGATCTCCCGATCCCCTATGGTCGACTCTCAGTACAATCTGCTCTGATGCCGCATAGTT 

AAGCCAGTATCTGCTCCCTGCTTGTGTGTTGGAGGTCGCTGAGTAGTGCGCGAGCAAAATTTAAGCTACA 

ACAAGGCAAGGCTTGACCGACAATTGCATGAAGAATCTGCTTAGGGTTAGGCGTTTTGCGCTGCTTCGCG 

ATGTACGGGCCAGATATACGCGTTGACATTGATTATTGACTAGTTATTAATAGTAATCAATTACGGGGTC 

ATTAGTTCATAGCCCATATATGGAGT.TCCGCGTTACATAACTTACGGTAAATGGCCCGCCTGGCTGACCG 

CCCAACGACCCCCGCCCATTGACGTCAATAATGACGTATGTTCCCATAGTAACGCCAATAGGGACTTTCC 

ATTGACGTCAATGGGTGGACTATTTACGGTAAACTGCCCACTTGGCAGTACATCAAGTGTATCATATGCC 

AAGTACGCCCCCTATTGACGTCAATGACGGTAAATGGCCCGCCTGGCATTATGCCCAGTACATGACCTTA 

TGGGACTTTCCTACTTGGCAGTACATCTACGTATTAGTCATCGCTATTACCATGGTGATGCGGTTTTGGC 

AGTACATCAATGGGCGTGGATAGCGGTTTGACTCACGGGGATTTCCAAGTCTCCACCCCATTGACGTCAA 

TGGGAGTTTGTTTTGGCACCAAAATCAACGGGACTTTCCAAAATGTCGTAACAACTCCGCCCCATTGACG 

CAAATGGGCGGTAGGCGTGTACGGTGGGAGGTCTATATAAGCAGAGCTCTCTGGCTAACTAGAGAACCCA 

CTGCTTACTGGCTTATCGAAATTAATACGACTCACTATAGGGAGACCCAAGCTGGCTAGCGTTTAAACTT 

AAGCTTACCATGGGCGCCGGCACCCCGGTGACCGCCCCGCTGGCGGGCACTATCTGGAAGGTGCTGGCCA 

GCGAAGGCCAGACGGTGGCCGCAGGCGAGGTGCTGCTGATTCTGGAAGCCATGAAGATGGAAACCGAAAT 

CCGCGCCGCGCAGGCCGGGACCGTGCGCGGTATCGCGGTGAAAGCCGGCGACGCGGTGGCGGTCGGCGAC 

ACCCTGATGACCCTGGCGGGCTCTGGATCCGATCTGTACGACGATGACGATAAGGTACCTAGGATCCAGT 

GTGGTGGAATTGATCCCTTCACCAAGGGCGTCGAGTCTAGAGGGCCCGTTTAAACCCGCTGATCAGCCTC 

GACTGTGCCTTCTAGTTGCCAGCCATCTGTTGTTTGCCCCTCCCCCGTGCCTrCCTTGACCCTGGAAGGT 

GCCACTCCCACTGTCCTTTCCTAATAAAATGAGGAAATTGCATCGCATTGTCTGAGTAGGTGTCATTCTA 

TTCTGGGGGGTGGGGTGGGGCAGGACAGCAAGGGGGAGGATTGGGAAGACAATAGCAGGCATGCTGGGGA 

TGCGGTGGGCTCTATGGCTTCTGAGGCGGAAAGAACCAGCTGGGGCTCTAGGGGGTATCCCCACGCGCCC 

TGTAGCGGCGCATTAAGCGCGGCGGGTGTGGTGGTTACGCGCAGCGTGACCGCTACACTTGCCAGCGCCC 

TAGCGCCCGCTCCTTTCGCTTTCTTCCCTTCCTTTCTCGCCACGTTCGCCGGCTTTCCCCGTCAAGCTCT 

AAATCGGGGCATCCCTTTAGGGTTCCGATTTAGTGCTTTACGGCACCTCGACCCCAAAAAACTTGATTAG 

GGTGATGGTTCACGTAGTGGGCCATCGCCCTGATAGACGGTTTTTCGCCCTTTGACGTTGGAGTCCACGT 

TCTTTAATAGTGGACTCTTGTTCCAAACTGGAACAACACTCAACCCTATCTCGGTCTATTCTTTTGATTT 

ATAAGGGATTTTGGGGATTTCGGCCTATTGGTTAAAAAATGAGCTGATTTAACAAAAATTTAACGCGAAT 

TAATTCTGTGGAATGTGTGTCAGTTAGGGTGTGGAAAGTCCCCAGGCTCCCCAGGCAGGCAGAAGTATGC 

AAAGCATGCATCTCAATTAGTCAGCAACCAGGTGTGGAAAGTCCCCAGGCTCCCCAGCAGGCAGAAGTAT 

GCAAAGCATGCATCTCAATTAGTCAGCAACCATAGTCCCGCCCGTAACTCCGCCCATCCCGCCCCTAACT 

CCGCCCAGTTCCGCCCATTCTCCGCCCCATGGCTGACTAATTTTTTTTATTTATGCAGAGGCCGAGGCCG 

CCTCTGCCTCTGAGCTATTCCAGAAGTAGTGAGGAGGCTTTTTTGGAGGCCTAGGCTTTTGCAAAAAGCT 

CCCGGGAGCTTGTATATCCATTTTCGGATCTGATCAGCACGTGTTGACAATTAATCATCGGCATAGTATA 

TCGGCATAGTATAATACGACAAGGTGAGGAACTAAACCATGGCCAAGCCTTTGTCTCAAGAAGAATCCAC 

CCTCATTGAAAGAGCAACGGCTACAATCAACAGCATCCCCATCTCTGAAGACTACAGCGTCGCCAGCGCA 

GCTCTCTCTAGCGACGGCCGCATCTTCACTGGTGTCAATGTATATCATTTTACTGGGGGACCTTGTGCAG 

AACTCGTGGTGCTGGGCACTGCTGCTGCTGCGGCAGCTGGCAACCTGACTTGTATCGTCGCGATCGGAAA 

TGAGAACAGGGGCATCTTGAGCCCCTGCGGACGGTGCCGACAGGTGCTTCTCGATCTGCATCCTGGGATC 

AAAGCCATAGTGAAGGACAGTGATGGACAGCCGACGGCAGTTGGGATTCGTGAATTGCTGCCCTCTGGTT 

ATGTGTGGGAGGGCTAAGCACTTCGTGGCCGAGGAGCAGGACTGACACGTGCTACGAGATTTCGATTCCA 

CCGCCGCCTTCTATGAAAGGTTGGGCTTCGGAATCGTTTTCCGGGACGCCGGCTGGATGATCCTCCAGCG 

CGGGGATCTCATGCTGGAGTTCTTCGCCCACCCCAACTTGTTTATTGCAGCTTATAATGGTTACAAATAA 

AGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTTCACTGCATTCTAGTTGTGGTTTGTCCAAAC 

TCATCAATGTATCTTATCATGTCTGTATACCGTCGACCTCTAGCTAGAGCTTGGCGTAATCATGGTCATA 

GCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAGCATAAAGTGT 

AAAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCACTGCCCGCTTTCCAGT 

CGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTTTGCGTATTGG 

GCGCTCTTCCGCTTCCTCGCTCACTGACTGGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGGTATCAGCT 

CACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAGCAAAAG 
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GCCAGCAAMGGCCAGGAACCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGA 

CGAGCATCACAAAAATCGACGCTCAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCG 

TTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCT 

TTCTCCCTTCGGGAAGCGTGGCGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGT 

TCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTAT 

CGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCA 

GAGCGAGGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGAC 

AGTATTTGGTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCCGGC 

AAACAAACCACCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGAT 

CTCAAGAAGATCCTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGAT 

TTTGGTCATGAGATTATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCA 

ATCTAAAGTATATATGAGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAG 

CGATCTGTCTATTTCGTTCATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGG 

CTTACCATCTGGCCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCA 

ATAAACCAGCCAGCCGGAAGGGCCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTA 

TTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGC 

TACAGGCATCGTGGTGTCACGCTCGTCGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGG 

CGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAA 

GTAAGTTGGCCGCAGTGTTATCACTCATGGTTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATC 

CGTAAGATGCTTTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAGAATAGTGTATGCGGCGACCG 

AGTTGCTCTTGCCCGGCGTCAATACGGGATAATACCGCGCCACATAGCAGAACTTTAAAAGTGCTCATCA 

TTGGAAAACGTTCTTCGGGGCGAAAACTCTCAAGGATCTTACCGCTGTTGAGATCCAGTTCGATGTAACC 

CACTCGTGCACCCAACTGATCTTCAGCATCTTTTACTTTCACCAGCGTTTCTGGGTGAGCAAAAACAGGA 

AGGCAAAATGCCGCAAAAAAGGGAATAAGGGCGACACGGAAATGTTGAATACTCATACTCTTCCTTTTTC 

AATATTATTGAAGCATTTATCAGGGTTATTGTCTCATGAGCGGATACATATTTGAATGTATTTAGAAAAA 

TAAACAAATAGGGGTTCCGCGCACATTTCCCCGAAAAGTGCCACCTGACGTC 
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TCGCGCGTTTCGGTGATGACGGTGAAAACCTCTGACACATGCAGCTCCCGGAGACGGTCACAGCTTGTCT 

GTAAGCGGATGCCGGGAGCAGACAAGCCCGTCAGGGCGCGTCAGCGGGTGTTGGCGGGTGTCGGGGCTGr 

CTTAACTATGCGGCATCAGAGCAGATTGTACTGAGAGTGCACCATATGCGGTGTGAAATACCGCACAGAT 

GCGTAAGGAGAAAATACCGCATCAGGCGCCATTCGCCATTCAGGCTGCGCAACTGTTGGGAAGGGCGATC 

GGTGCGGGCCTCTTCGCTATTACGCCAGCTGGCGAAAGGGGGATGTGCTGCAAGGCGATTAAGTTGGGTA 

ACGCCAGGGTTTTCCCAGTCACGACGTTGTAAAACGACGGCCAGTGCCAGTGAATTAATTCGTTGCAGGA 

CAGGATGTGGTGCCCGATGTGACTAGCTCTTTGCTGCAGGCCGTCCTATCCTCTGGTTCCGATAAGAGAC 

CCAGAACTCCGGCCCCCCACCGCCCACCGCCACCCCCATACATATGTGGTACGCAAGTAAGAGTGCCTGC 

GCATGCCCCATGTGCCCCACCAAGAGTTTTGCATCCCATACAAGTCCCCAAAGTGGAGAACCGAACCAAT 

TCTTCGCGGGCAGAACAAAAGCTTCTGCACACGTCTCCACTCGAATTTGGAGCCGGCCGGCGTGTGCAAA 

AGAGGTGAATCGAACGAAAGACCCGTGTGTAAAGCCGCGTTTCCAAAATGTATAAAACCGAGAGCATCTG 

GCCAATGTGCATCAGTTGTGGTCAGCAGCAAAATCAAGTGAATCATCTCAGTGCAACTAAAGGGGGGATC 

TAGCGTTTAAACTTAAGCTTACCATGGGCGCCGGCACCCCGGTGACCGCCCCGCTGGCGGGCACTATCTG 

GAAGGTGCTGGCCAGCGAAGGCCAGACGGTGGCCGCAGGCGAGGTGCTGCTGATTCTGGAAGCCATGAAG 

ATGGAAACCGAAATCCGCGCCGCGCAGGCCGGGACCGTGCGCGGTATCGCGGTGAAAGCCGGCGACGCGG 

TGGCGGTCGGCGACACCCTGATGACCCTGGCGGGGTCTGGATCCGATCTGTACGACGATGACGATAAGGT 

ACATCAAACAAGTTTGTACAAAAAAGCTGAACGAGAAACGTAAAATGATATAAATATCAATATATTAAAT 

TAGATTTTGCATAAAAAACAGACTACATAATACTGTAAAACACAACATATCCAGTCACTATGGCGGCCGC 

ATTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATAATGTGTGGATTTTGAGTTAGGATCCG 

GCGAGATTTTCAGGAGCTAAGGAAGCTAAAATGGAGAAAAAAATCACTGGATATACCACCGTTGATATAT 

CCCAATGGCATCGTAAAGAACATTTTGAGGCATTTCAGTCAGTTGCTCAATGTACCTATAACCAGACCGT 

TCAGCTGGATATTACGGCCTTTTTAAAGACCGTAAAGAAAAATAAGCACAAGTTTTATCCGGCCTTTATT 

CACATTCTTGCCCGCCTGATGAATGCTCATCCGGAATTCCGTATGGCAATGAAAGACGGTGAGCTGGTGA 

TATGGGATAGTGTTCACCCTTGTTACACCGTTTTCCATGAGCAAACTGAAACGTTTTCATCGCTCTGGAG 

TGAATACCACGACGATTTCCGGCAGTTTCTACACATATATTCGCAAGATGTGGCGTGTTACGGTGAAAAC 

CTGGCCTATTTCCCTAAAGGGTTTATTGAGAATATGTTTTTCGTCTCAGCCAATCCCTGGGTGAGTTTCA 

CCAGTTTTGATTTAAACGTGGCCAATATGGACAACTTCTTCGCCCCCGTTTTCACCATGGGCAAATATTA 

TACGCAAGGCGACAAGGTGCTGATGCCGCTGGCGATTCAGGTTCATCATGCCGTCTGTGATGGCTTCCAT 

GTCGGCAGAATGCTTAATGAATTACAACAGTACTGCGATGAGTGGCAGGGCGGGGCGTAAACGCGTGGAT 

CCGGCTTACTAAAAGCCAGATAACAGTATGCGTATTTGCGCGCTCGCGAACCGGTGTATACCCGAAGTAT 

GTCAAAAAGAGGTGTGCTATGAAGCAGCGTATTACAGTGACAGTTGACAGCGACAGCTATCAGTTGCTCA 

AGGCATATATGATGTCAATATCTCCGGTCTGGTAAGCACAACCATGCAGAATGAAGCCCGTCGTCTGCGT 

GCCGAACGCTGGAAAGCGGAAAATCAGGAAGGGATGGCTGAGGTCGCCCGGTTTATTGAAATGAACGGCT 

CTTTTGCTGACGAGAACAGGGACTGGTGAAATGCAGTTTAAGGTTTACACCTATAAAAGAGAGAGCCGTT 

ATCGTCTGTTTGTGGATGTACAGAGTGATATTATTGACACGCCCGGGCGACGGATGGTGATCCCCCTGGC 

CAGTGCACGTCTGCTGTCAGATAAAGTCTCCCGTGAACTTTACCCGGTGGTGCATATCGGGGATGAAAGC 

TGGCGCATGATGACCACCGATATGGCCAGTGTGCCGGTCTCCGTTATCGGGGAAGAAGTGGCTGATCTCA 

GCCACCGCGAAAATGACATCAAAAACGCCATTAACCTGATGTTCTGGGGAATATAAATGTCAGGCTCCGT 

TATACACAGCCAGTCTGCAGGTCGACCATAGTGACTGGATATGTTGTGTTTTACAGTATTATGTAGTCTG 

TTTTTTATGCAAAATCTAATTTAATATATTGATATTTATATCATTTTACGTTTCTCGTTCAGCTTTCTTG 

TACAAAGTGGTGATAATTAATTAAGATCTAGAGGGCCCGTTTAAACCCGCTGATCAGCCTCGACTGTGCC 

TTCTAAGATCCAGACATGATAAGATACATTGATGAGTTTGGACAAACCACAACTAGAATGCAGTGAAAAA 

AATGCTTTATTTGTGAAATTTGTGATGCTATTGCTTTATTTGTAACCATTATAAGCTGCAATAAACAAGT 

TAACAACAACAATTGCATTCATTTTATGTTTCAGGTTCAGGGGGAGGTGTGGGAGGTTTTTTAAAGCAAG 

TAAAACCTCTACAAATGTGGTATGGCTGATTATGATCAGTCGACCTGCAGGCATGCAAGCTTGGCGTAAT 

CATGGTCATAGCTGTTTCCTGTGTGAAATTGTTATCCGCTCACAATTCCACACAACATACGAGCCGGAAG 

CATAAAGTGTAAAGCCTGGGGTGCCTAATGAGTGAGCTAACTCACATTAATTGCGTTGCGCTCACTGCCC 

GCTTTCCAGTCGGGAAACCTGTCGTGCCAGCTGCATTAATGAATCGGCCAACGCGCGGGGAGAGGCGGTT 

TGCGTATTGGGCGCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGC 

GGTATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATG 
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TGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATr 
CGGTAACTATCGTCTTGAGTCCAACCCGGTAAGACACGACTTATCGCCACT^ 



^^rS AGCGATCTGTCTATTTCGTTCATCCATAGTTGCCTG ACTCCCCGTCGTGTAGATAAC 
TACGGGAGGGCTTACCATCT(^CCCCAGTGCTGCAATGATACCGCGAGACCCACGCTCAC™ 

ATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAGTAAGTAGTTCGCCAGTT^ 

^gccattgctacaggcatcgtggtgtcacgctcgtcgtttggtatggcttS 

ACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCAAAAAAGCGGTTAGCTCCTTCGGTC^ 

SI^ GAAGTAAGTTGGCCGCAGTGT ^^^ 

^?^ CATCCGTAAGATGCTTTTCTGTGACTGG ^^ 

™ GACCGAGTTG ^^ 

^f°JS A r ATTGGAAAAcGTtcTTcGGGGc ^ 

cgatgtaacccactcgtgcacccaactgatcttcagcatcttttactttcaccagcgtS 

AA ^Sf AAGGCAAAATGCCGCAAAAAAG 

ttcctttttcaatattattgaagcatttatcagggttattgtctcatgagcggata^ 

™ AGA ^ TAAAC ^^ 

cattattatcatgacattaacctataaaaataggcgtatcacgaggccctttcgt 
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Shaded regions correspond to those DNA sequences transferred from die 
entry clone into die pETlM-DEST 1 * vector by recombination. Non-shaded 
regions are derived from the pET104-DEST rw vector. 

Bases 568 and 2230 of the pET104-DEST w sequence are marked. 

The biotin binding site is labeled with a *. 



121 



201 



281 



ATAGGCGCCA GGAACCGCAC CTGTGGCGCC GGTGATCCCG GCCACGATGC 

3- 



GAGGATCGAG ATCTCGATCC 



T7 promoter 



CGCGAAATTA ATACGACTCA CTATAGGGGA ATTGTGAGCG GATAACAATT CCCCTCTAGA AATAATTTTG TTTAACTTTA 

gotflfl™ 



BBS 



Mat Gly Ala Glv Thr Pro Val Thr Ala Pro Leu Ala Gly Thr lie Trp Lys Val 
AGAAGGAGAT ATACAT ATG GGC GCC GGC ACC CCG GtfG ACC GCC CCG CTG GCG GGC ACT ATC TGG AAG GTG 



351 



417 



463 



549 



Leu Ala Ser Glu Gly Gin Thr Val Ala Ala Gly 
CTG GCC AGC GAA GGC GAG ACG GTG GCC GCA GGC 



Glu Val lieu Leu He Leu Glu Ala Met Lya Met 
GAG GTG CTG CTG ATT CTG GAA GCC ATG AAG ATG 

Btotin btndtnQ *He 



Glu Thr Glu He Am Ala Ala Gio Ala Glv Thr 
GAA ACC GAA ATC CGC GCC GCG CAG GCC GGG ACC 
Bfoteo™ toward prtrring cite 



1 1 1 

val Ala Vai Gly Asp Thr Leu Met Thr Leu Ala 
GTG GCG GTC GGC GAC ACC CTG ATG ACC CTG GCG 

Ma 

Lye Gly He Tie Thr Ser| Ig^JPygMroLyaA^ 
AAG GGA ATT ATC ACA AGT Ttj^^^^^^^^ 
JCCT TAA TAG TGT TCA AA C ATG TTT ) 
EKcteavaofirite 



^ m IkZ g£ A&V GC? GG? g£§ §CG 

EK recognition the 

Gly Ser Gly Ser Asp Leu Tyr Asp Asp Asp Asp 
GGC TCT GGA TCC GAT CTG TAC GAC GAT GAC GAT 

2210 

Gly . . « . »»» | 

" " ~~— - TTGTACAAAG 

TTTC 



fOBl 



2241 TGGTGATAAT TAATTAAGAT AGCTCAGATC CGGCTGCTAA 
ACCACTATTA 



T7 rewsoprtmfoQ iUg 
CAAAGCCCGA AAGGAAGCTG AGTTGGCTGC TGCCACCGCT 



2321 GAGCAATAAC TAGCATAACC 



Fiq- 16 
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Topolsomerase 




CCCTT 

GGGAAGTGG 



Overhang 



CACC ATG HNN — 
GTGG TAC NNN 



PCR product 



NKN AAG GG- 
NNK TTC CO 



Overhang Invades double-stranded 
DNA. displacing <te bottom slrand. 




-CCCTTCACC ATG KNN 
-GGGAAGTGG TAC HNM 



Topolsomerase 



NNN A AG GG- 
NNN TTC CC- 



Fk.17 
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Flow Chart 



The flow chart below describes the general steps required to clone and express your 
blunt-end PCR product. v y 



Determine strategy for PCR 

. _L 

Produce blunt-end PCR product 
usingproperly designed PCR primers 



TOPO* Cloning Reaction: 
Mx together PCR product and pET104/D-TOPO® vector 



Incubate 5 minutes 
at room temperature 



Transform into TOP10 £. colt cells 




Select a nd analyze colonies 

I 



Choose a positive transformant and 
isolate plasmid DNA 



. Transform. BL21 Star™(DE3) 
and induce expression with IPTG 
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121 ATAGGCGCCA GCAACCGCAC CTGTGGCGCC GGTGATGCCG GCCACGATGC GTCCGGCGTA GAGGATCGAG ATCTCGATCC 
T7 pfOfttotofforitnlnq sOa 

| ^ T7 promote; H />c operator 

201 CGCGAAATTA ATACGACTCA CTATAGGGGA ATTGTGAGCG GATAACAATT CCCCTCTAGA AATAATTTTG TTTAACTTTA 

RBS ftotog^ 



281 AGAAGGAGAT ATACAT ATG GGC GCC GGC ACC CCG GTG ACC GCC CCG CTG GCG GGC ACT ATC TGG AAG GTG 
Met Giy Ala Gly Thr Pro Val Thr Ala Pro Leu Ala Gly Thr lie Trp Lys Val 

351 CTG GCC AGC GAA GGC CAG ACG GTG GCC GCA GGC GAG GTG CTG CTG ATT CTG GAA GCC ATG AAG ATG 
Leu Ala Ser Glu Gly Gin Thr Val Ala Ala Gly Glu Val Leu Leu He Leu Glu Ala Met Lys Met 
Stofln binding •«« 

417 GAA ACC GAA ATC CGC GCC GCG CAG GCC GGG ACC GTG CGC GGT ATC GCG GTG AAA GCC GGC GAC GCG 

Glu Thr Glu He Arg Ala Ala Gin Ala Gly Thr Val Arg Gly He Ala Val Lys Ala Gly Asp Ala 

Btotag* 1 fa fwafd printing tflo 

j j j . 

483 GTG GCG GTC GGC SAC ACC CTG ATG ACC CTG GCG GGC TCT GGA TCC GAT CTG TAC GAC GAT GAC GA*r 
Val Ala Val Gly Aop Thr Leu Met Thr Leu Ala Gly Ser Gly Ser Aap Leu Tyr t Asp Asp Asp Asp 

£K recognition rile 

549 AAG GGA ATT GAT CCC TTW^ flWfjglWB AAGGGCGAGCT CA.GATCCGGC TGCTAA.CAAA GCCCGAAAGG 
Lys^Gly He Asp Pro 

EKdeavage site ^ T7 n?vor»c priming cite 

611 AAGCTGAGTT GGCTGCTGCC ACCGCTGAGC AATAACTAGC 
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10SS 



Shaded regions correspond to those DNA sequences transferred from die 
entry done into the pcDNA6 / Biotag-DEST™ vector by recombination. Non- 
shaded regions are derived from the pcDNA6/Biotag-DEST m vector. 
Bases 1191 and 2853 of die pcDNA6/ Biotag-DEST™ sequence ore marked. 
The biotin binding site is labeled with a *. 
Potential stop codons are underlined. 



rmd of CMV promoter 
V 



761 CCCATTGACG CAAATGGGCG GTAGQCGTGT ACGGYGGGAG GTcWtaa GCAGAGCTC? 0^0^^^^^^ ** 

T7 promoWprtmlnq tgq 

841 CTGCTTACTG GCTTATCGAA ATTAATACGA CTCACTATAG GGAGACCCAA GCTGGCTAGC GTTTAAACTT AAGCTTACC XTO 

^ eiotofl"' Mot 

Civ A 1 n tt TVl f Or A It*. 1 «!l HI.. n » . . . . _ . ~. „ " " 

923 



- - » - - SS % gg & ^ £ & £g ffi Si ^ & ^ ^ 



Gin Thr Vai Ala Ala Gly Glu Val 
989 CAG ACG GTG GCC GCA GGC GAG GTG 



Leo Leu lie Leu Glu Ala Mot Lyn Met Glu Thr Glu _!«• 
CTG CTG ATT CTG GAA GCC ATG AAG ATC GAA ACC G*l ATC* 

Blotto Hiring «H» 



CGC 



«« - - & 25 & & - ffi E a* g ^ ^ v., § 



' forward printing rite 



Thr Leu Met Thr Leu Ala Gly Ser 
1121 ACC CTG ATO ACC CTG GCG GGC TCT 

1191 

3or| Leu Tyr Lys Lys Ala Gly ... 



EKrecoqnffion tap 



1187 AGT TT 

TCA AAC ATG 



mmm§ 



j EK cfemg* tftfl 

rr? 2ro ™S ™- Tyr Asp Asp Aop Asp Lys Val His Gin 
GGA TCC GAT CTG TAC GAC GAT GAC GAT AAG GTA CAT CAA 

CAT GTA GTT 

26 S3 

.1 

jCTTOTACA AAGTQ GTGAT AATTAAT TAA 
TTCACCACTA TTAATTAATT 



Asp 
GAC 



Thr 
ACA 
TOT 




BOH iwbtw prfmtno efto 

2081 GATCTAGAGG GCCCGTTTAA ACCCGCTGAT CAGCCTCCAC TGTGCCTTCT AGTTGCCAGC CATCTGTTGT 



Fi 3 . 310 
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Flow Chart 



The flow chart below outlines the experimental steps necessary to done and 
express your blunt-end PCR product. 




^Prepare purified plasmid for transfection 

i 

Transfect mammaliaa cell line aad 
test for expression of gene of interest 
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CAAT WA S'ondofCMVDiwnolw > K 

761 CCCATTGACG CAAATG6GCG GTAGGCGTGT ACGGTGGGAG GTCTATATAA GCAGAGCTC? CTGGCTAAcl^G^C^T 

TT promoto/prtmlnp tte 

841 CTGCTTACTG GCTTATCGAA AtfTAATACCA CTCACTATAG GG'aGACCCAA GCTGGCTAGC GTTTAAACTT AAGCTTACC ATG 

Hat 

, ; . etotofl- 

923 GGC GCC GGC ACC CCG CTG^ACCGCCCCGCTG~GCfT~«S^ " ' 

C ly Ala Gly Tbr Pro Vel Thr 25 gg 22 &2 Gly ft? fS g £« §ff £S Ala Ser Glu Gly 



989 



1055 



= s a s s s a: s ss s s = sa s s js ssss s sr. 

S EfiE5 * " 5 555 85 25 ™ ™ - ~ ~ s «a sa ?s as 

1 1 As?*718l f^nl 

1121 & a s » as » ss in m s as ss ^a;agagss) a « « ^ 



1167 



EKrecoanflk)««ftd — , 

Xt»l ^,,1 BCctoavantto 

CAG TGT GGT GGA ATT GAT CCC gf^ffi^^ AAGGGCG T^AGt'ctAG INQ&r TAAACCCGCT 
Gin Cya Gly Gly He Asp Pre pfie Th^Sf™"™ 




B GH revorea prfmfoo tfte 

1251 GATCAGCCTC GACTGTGCCT TCTAGTTGCC AGCCATCTGT TGTTTGCCCC 
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• Shaded regions correspond to those DNA sequences transferred from the 
entry clone into the pMT/Biotag~-DEST vector by recombination. Non- 
shaded regions are derived from the pMT/Biotag^-DEST vector 

• Bases 1135 and 2797 of die pMT/Biotag^-DBST sequence are marked. 

• The biotin binding site is labeled with a*. 

• Potential stop codons are underlined. 



p- 5'endcf mrfatkrfhioneln premolar 



MaM regulatory fogfcn 



411 CG7TGCAGGA CAGGATGTGG TGCCCGATGT GACTAGCtCT TTGCTGCAGG CCGTCCTA TC CTCTGGTTCC G ATAAGAGAC CCAGAACTCC 



501 



591 



GGCCCCCCAC CGCCCACCGC CACCCCCATA CATATGTGGT ACGCAAGTAA GAGTGCCTGC GCATGCCCCA TGTCCCCCAC 

y Mdri rect ory WQiont ^ 

GCATCCCATA CAAGTCCCCA AAGTGGAGAA CCGAACCAAT TCTTCGCGGG CAGAACAAAA G CTTCTGCAC ACG TCTCCAC TCGAATTTGG 



€61 



771 



861 



930 



999 



1069 



1137 



TATA 

ACCCGC-CCCG CGTGTGCfiAA »GAGGTC-AAT CGMCGAAAC ACCCGTGTGT MAGC CGCGT TTCCAAA21TG 'tATAAARCCG AGAGCATCTG 
6tort of tranwriptton 

GCCAATGTGC ATCAGTIGTG GTCAGCAGCA MMCMOXe AATCATCTCA GTGCAACTAA AGGC-GGOATC TAGCGTTTAR ACTTAAQCTT 
r Oietoa" 

McSgBSSgga S S GTG £S 3 ^ s§ ^ ^ ss 3 & s ^ ^ s 
GGC S2 Tel V ^ £ gca ggc Si S5 SB £ 35 SX £ S$ pS % SK S§ SX £ & 



EKfecoonWoniHa 



rEKdMvagattto 



! J w 1115 

assissssass a sssssss s s s sfe 

CAT GTA GTT TGT TCA AXb 




^^^S-Sy^ 0755 TGATAATTAA TTAAGATCTA GAGGGCCCGT 
^^^^^TTTCACC AC7ATTAATT 

•802 
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IN THE PCT/US RECEIVING OFFICE 
UNDER THE PATENT COOPERATION TREATY 

In re International Application of: INVITROGEN CORPORATION 

International Application No.: PCT/US03/21339 

International Filing Date: 08 July 2003 

Title: METHODS AND COMPOSITIONS FOR THE PRODUCTION, IDENTIFICATION AND 
PURIFICATION OF FUSION PROTEINS 

Commissioner for Patents 

Mail Stop PCT 

Alexandria, VA 22313-1450 



SUBMISSION OF FORMAL DRAWINGS 
UNDER PCT RULE 11.13 

Sir: 

In accordance with PCT Rule 1 1 . 1 3, the undersigned hereby submits 32 sheets of formal 
drawings (comprising figures 1 through 25) in connection with the above international application. 
The undersigned requests that the formal figures be transmitted to the International Bureau as soon 
as possible for technical preparations of the international publication document. 



Respectfully submitted, 




Robert W. Esmond 
Attorney for Applicant 
Reg. No. 32,893 

Date: 13 November 2003 

RWE/FRC/CAG/lmw 
Enclosures: 

Formal Drawings - 32 Sheets 
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IN THE INTERNATIONAL SEARCHING AUTHORITY (ISA/US) 
UNDER THE PATENT COOPERATION TREATY 

In re International Application of: INVITROGEN CORPORATION 
International Application No.: PCT/US03/21339 
International Filing Date: 08 July 2003 

Title: METHODS AND COMPOSITIONS FOR THE PRODUCTION, 
IDENTIFICATION AND PURIFICATION OF FUSION PROTEINS 

Commissioner for Patents 

Mail Stop PCT 

Alexandria, VA 22313-1450 

ATTENTION: Lissie Marquis 

SUBMISSION OF NUCLEOTIDE 
AND/OR AMINO ACID SEQUENCE LISTING COMPLYING WITH STANDARD 
(under PCT Rule 13ter.l(a) and (c) and Administrative Instructions, 
Section 208 and Annex C) 

Ms. Marquis: 

4 

The undersigned hereby submits a diskette containing the sequence listing and a paper 
copy thereof. The undersigned confirms that the diskette containing the sequence listing does 
not include matter which goes beyond the disclosure of the international application as filed. 
Moreover, the paper copy of the sequence listing and the computer readable form are the same. 



Date: 13 November 2003 



RWE/FRC/CAG/lmw 



Enclosures: 

Paper Copy of Sequence Listing 
Diskette 



Respectfully submitted, 




Robert W. Esmond 
Attorney for Applicant 
Reg. No. 32,893 
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SEQUENCE LISTING 

<110> Invitrogen Corporation 

<120> Methods and Compositions for the Production, Identification and 
Purification of Fusion Proteins 

<130> 0942 .551PC03 

<140> PCT/US03/2133 9 

<141> 2003-07-08 

<150> 60/393,756 

<151> 2002-07-08 

<150> 60/396,627 

<151> 2002-07-19 

<150> 60/417,172 % 

<151> 2002-10-10 

<160> 34 

<170> Patentln version 3.2 

<210> 1 

<211> 7618 

<212> DNA 

<213> Artificial 

<220> 

<223> pET104-DEST 

<400> 1 



caaggagatg 


gcgeccaaca 


gtcccccggc 


cacggggcct 


gccaccatac 


ccacgccgaa 


60 


acaagcgctc 


atgagcccga 


agtggcgagc 


ccgatcttcc 


ccatcggtga 


tgtcggcgat 


120 


ataggcgcca 


gcaaccgcac 


ctgtggcgcc 


ggtgatgccg 


gccacgatgc 


gtccggcgta 


180 


gaggatcgag 


atctcgatcc 


cgcgaaatta 


atacgactca 


ctatagggga 


attgtgagcg 


240 


gataacaatt 


cccctctaga 


aataattttg 


tttaacttta 


agaaggagat 


atacatatgg 


300 


gcgccggcac 


cccggtgacc 


gccccgctgg 


cgggcactat 


ctggaaggtg 


ctggccagcg 


360 


aaggccagac 


ggtggccgca 


ggcgaggtgc 


tgctgattct 


ggaagccatg 


aagatggaaa 


420 


ccgaaatccg 


cgccgcgcag 


gccgggaccg 


tgcgcggtat 


cgcggtgaaa 


gccggcgacg 


480 


cggtggcggt 


cggcgacacc 


ctgatgaccc 


tggcgggctc 


tggatccgat 


ctgtacgacg 


540 


atgacgataa 


gggaattatc 


acaagtttgt 


acaaaaaagc 


tgaacgagaa 


acgtaaaatg 


600 


atataaatat 


caatatatta 


aattagattt 


tgcataaaaa 


acagactaca 


taatactgta 


660 


aaacacaaca 


tatccagtca 


ctatggcggc 


cgcattaggc 


accccaggct 


ttacacttta 


720 


tgcttccggc 


tcgtataatg 


tgtggatttt 


gagttaggat 


ccggcgagat 


tttcaggagc 


780 


taaggaagct 


aaaatggaga 


aaaaaatcac 


tggatatacc 


accgttgata 


tatcccaatg 


840 
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gcatcgtaaa 


gaacattttg 


aggcatttca 


gtcagttgct 


caatgtacct 


ataaccagac 


900 


cgttcagctg 


gatattacgg 


cctttttaaa 


gaccgtaaag 


aaaaataagc 


acaagtttta 


960 


tccggccttt 


attcacattc 


ttgcccgcct 


gatgaatgct 


catccggaat 


tccgtatggc 


1020 


aatgaaagac 


ggtgagctgg 


tgatatggga 


tagtgttcac 


ccttgttaca 


ccgttttcca 


1080 


tgagcaaact 


gaaacgtttt 


catcgctctg 


gagtgaatac 


cacgacgatt 


tccggcagtt 


1140 


tctacacata 


tattcgcaag 


atgtggcgtg 


ttacggtgaa 


aacctggcct 


atttccctaa 


1200 


agggtttatt 


gagaatatgt 


ttttcgtctc 


agccaatccc 


tgggtgagtt 


tcaccagttt 


1260 


tgatttaaac 


gtggccaata 


tggacaactt 


cttcgccccc 


gttttcacca 


tgggcaaata 


1320 


ttatacgcaa 


ggcgacaagg 


tgctgatgcc 


gctggcgatt 


caggttcatc 


atgccgtctg 


1380 


tgatggcttc 


catgtcggca 


gaatgcttaa 


tgaattacaa 


cagtactgcg 


atgagtggca 


1440 


gggcggggcg 


taaacgcgtg 


gatccggctt 


actaaaagcc 


agataacagt 


atgcgtattt 


1500 


gcgcgcaccg 


gtgctagcgt 


atacccgaag 


tatgtcaaaa 


agaggtgtgc 


tatgaagcag 


1560 


cgtattacag 


tgacagttga 


cagcgacagc 


tatcagttgc 


tcaaggcata 


tatgatgtca 


1620 


atatctccgg 


tctggtaagc 


acaaccatgc 


agaatgaagc 


ccgtcgtctg 


cgtgccgaac 


1680 


gctggaaagc 


ggaaaatcag 


gaagggatgg 


ctgaggtcgc 


ccggtttatt 


gaaatgaacg 


1740 


gctcttttgc 


tgacgagaac 


agggactggt 


gaaatgcagt 


ttaaggttta 


cacctataaa 


1800 


agagagagcc 


gttatcgtct 


gtttgtggat 


gtacagagtg 


atattattga 


cacgcccggg 


1860 


cgacggatgg 


tgatccccct 


ggccagtgca 


cgtctgctgt 


cagataaagt 


ctcccgtgaa 


1920 


ctttacccgg 


tggtgcatat 


cggggatgaa 


agctggcgca 


tgatgaccac 


egatatggcc 


1980 


agtgtgccgg 


tctccgttat 


cggggaagaa 


gtggctgatc 


tcagccaccg 


cgaaaatgac 


2040 


atcaaaaacg 


ccattaacct 


gatgttctgg 


ggaatataaa 


tgtcaggctc 


cgttatacac 


2100 


agccagtctg 


caggtcgacc 


atagtgactg 


gatatgttgt 


gttttacagt 


attatgtagt 


2160 


ctgtttttta 


tgcaaaatct 


aatttaatat 


attgatattt 


atatcatttt 


acgtttctcg 


2220 


ttcagctttc 


ttgtacaaag 


tggtgataat 


taattaagat 


agctcagatc 


cggctgctaa 


2280 


caaagcccga 


aaggaagctg 


agttggctgc 


tgccaccgct 


gagcaataac 


tagcataacc 


2340 


ccttggggcc 


tctaaacggg 


tcttgagggg 


ttttttgctg 


aaaggaggaa 


ctatatccgg 


2400 


atatcccgca 


agaggcccgg 


cagtaccggc 


ataaccaagc 


ctatgcctac 


agcatccagg 


2460 


gtgacggtgc 


cgaggatgac 


gatgagcgca 


ttgttagatt 


tcatacacgg 


tgcctgactg 


2520 


cgttagcaat 


ttaactgtga 


taaactaccg 


cattaaagct 


agcttatcga 


tgataagctg 


2580 


tcaaacatga 


gaattaattc 


ttgaagacga 


aagggcctcg 


tgatacgcct 


atttttatag 


2640 


gttaatgtca 


tgataataat 


ggtttcttaij 


acgtcaggtg 


gcacttttcg 


gggaaatgtg 


2700 
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cgcggaaccc 


ctatttgttt 


atttttctaa 


dLdUat UUdd 


auauyuauuu 


y u uu a tyay a 


2760 


caataaccct 


gataaatgct 


tcaataatat 


f- r~r 3 O O O O /"TPT a 

tyaaaaayga 


arrant" a t*nan 

ay ay ud uy ay 


hahhpaapaf 
uauuuaauau 




ttccgtgtcg 


cccttattcc 


cttttttgcg 




uuuuuyuuuu 


t"p;p t"Papppa 
uyuuuauuua 


? ft RfJ 


gaaacgctgg 


tgaaagtaaa 


agatgctgaa 


yaULay u uyy 


yuyuauyay u 


nnnfr" t"arat*p 

yyy *— uauanw 


2940 


gaactggatc 


tcaacagcgg taagatcctt 


gdydy tutLt 


ct r* r* r* r* n a ana 

y uuuuy ddy d 


ar , nt"ht"fr > ra 
auyuuuuuua 




atgatgagca 


cttttaaagt 


tctgctatgt 


yycycyy tat 


ua uuuuy uy u 


uyaLyLLyyy 


3 060 


caagagcaac 


tcggtcgccg 


catacactat 


LCLCayaaty 


dctcyy ttyd 


nt* a t~ oaor*a 
yuduuuauua 


J i. ^ V 


gtcacagaaa 


agcatcttac 


ggatggcatg 


dCdy tddydy 


aat^^a^nr'a i^r 

aauuauyu ay 


t"nr*t*nrra t" a 


3180 


accatgagtg 


ataacactgc 


ggccaactta 


« 4- fpf napaa 
CLLLtydLdd 


/-•/-rat - /-•fro a on 
uyduuyydyy 


a ppna a nns n 
auuyaayyay 


3240 


ctaaccgctt 


ttttgcacaa 


catgggggat 


CdLyCaaCtC 


ycuuuyduuy 


u uyyy dduuy 


J J V v 


gagctgaatg aagccatacc 


aaacgacgag 


cgcgacacca 


cyauyccugc 


dy udduyy ud 


j jDU 


acaacgttgc 


gcaaactatt 


aactggcgaa 


ccacctdccc 


uagcuucccg 


gCddtddLLd 


J *X £. \J 


atagactgga 


tggaggcgga 


taaagttgca 


ggaccac u tc 


uy cy cucyy c 


cc u uuuyy u u 


^4 ft 0 


ggctggttta 


ttgctgataa 


atctggagcc 


ggtgagcgtg 


ggucucgcyg 


fafpaH - c^c* a 
UdUUdU Uy Ud 


O D *± U 


gcactggggc 


cagatggtaa 


gccctcccgt 


dLCyUaytCa 


t~ pt" a ra rna p 


ygggdyucdy 


J W \J V 


gcaactatgg 


atgaacgaaa 


tagacagatc 


ycuydyaCdy 


<t f- rtr* t~ pan!" 
yLyuLLLdLi 




3660 


tggtaactgt 


cagaccaagt 


ttactcatat 


atdCt.LL.dyd 


UUydUUUddd 




3720 


taatttaaaa 


ggatctaggt 


gaagatcctt 


LtCyaLddLC 


4- 4~ «t a /■"» ^ a a 

CCdLydCCdd 


aahpppht*aa 
ddUUUUUUdd 


17ft 0 


cgtgagtttt 


cgttccactg 


agcgtcagac 


z~i /-+ r*i ^~ a it a a a 
CCCytdyddd 


3 /-j a f~ 0 a a a ctn 
dyduudddyy 


auuuuu L-uya 


3840 


gatccttttt 


ttctgcgcgt 


aatctgctgc 


t" - t-<**r/-<aaa*'" , aa 


aaaaar^r^ar^r^ 




3900 


gtggtttgtt 


tgccggatca 


agagctacca 




^*yaayy aa\_ 


taacttcaac 


3960 


agagcgcaga 


taccaaatac 


tgtccttcta 


y uy udyuuy u 


ay tayyLL>a 


rrart'traaa 


4020 


aactctgtag 


caccgcctac 


atacctcgct 


uuyuuaauLL 


t~ ort" t" a. r* pant - 


nor t - ac 1r a p a 

y y c t.y^ uy^v* 


4080 


agtggcgata 


agtcgtgtct 


taccgggttg 


y au Luaayat 


nat"anft"arr 


y y « taayy^y 


4140 


cagcggtcgg gctgaacggg gggttcgtgc 


apapap;pppa 
dCdCdytLLd 


rtf^ trirrapipn 
y u u uy y dy uy 


aannapphar 
aauyauuuav~ 


4200 


accgaactga 


gatacctaca 


gcgtgagcta 


tgagaaagcg 




uyaayyyaya 


4260 


aaggcggaca 


ggtatccggt 


aagcggcagg 


gtcggaacag 


gagagcgcac 


gagggagctt 


4320 


ccagggggaa 


acgcctggta 


tctttatagt 


cctgtcgggt 


ttcgccacct 


ctgacttgag 


4380 


cgtcgatttt 


tgtgatgctc 


gtcagggggg 


cggagcctat 


ggaaaaacgc 


cagcaacgcg 


4440 


gcctttttac 


ggttcctggc cttttgctgg ccttttgctc 


acatgttctt 


tcctgcgtta 


4500 


tcccctgatt 


ctgtggataa ccgtattacc gcctttgagt 


gagctgatac 


cgctcgccgc 


4560 
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agccgaacga 


ccgagcgcag 


cgagtcagtg 


agcgaggaag 


cggaagagcg 


cctgatgcgg 


4620 


tattttctcc 


ttacgcatct 


gtgcggtatt 


tcacaccgca 


tatatggtgc 


actctcagta 


4680 


caatctgctc 


tgatgccgca 


tagttaagcc 


agtatacact 


ccgctatcgc 


tacgtgactg 


4740 


ggtcatggct 


gcgccccgac 


acccgccaac 


acccgctgac 


gcgccctgac 


gggcttgtct 


4800 


gctcccggca 


tccgcttaca 


gacaagctgt 


gaccgtctcc 


gggagctgca 


tgtgtcagag 


4860 


gttttcaccg 


tcatcaccga 


aacgcgcgag 


gcagctgcgg 


taaagctcat 


cagcgtggtc 


4920 


gtgaagcgat 


tcacagatgt 


ctgcctgttc 


atccgcgtcc 


agctcgttga 


gtttctccag 


4980 


aagcgttaat 


gtctggcttc 


tgataaagcg 


ggccatgtta 


agggcggttt 


tttcctgttt 


5040 


ggtcactgat 


gcctccgtgt 


aagggggatt 


tctgttcatg 


ggggtaatga 


taccgatgaa 


5100 


acgagagagg 


atgctcacga 


tacgggttac 


tgatgatgaa 


catgcccggt 


tactggaacg 


5160 


ttgtgagggt 


aaacaactgg 


cggtatggat 


gcggcgggac 


cagagaaaaa 


tcactcaggg 


5220 


tcaatgccag 


cgcttcgtta 


atacagatgt 


aggtgttcca 


cagggtagcc 


agcagcatcc 


5280 


tgcgatgcag 


atccggaaca 


taatggtgca 


gggcgctgac 


ttccgcgttt 


ccagacttta 


5340 


cgaaacacgg 


aaaccgaaga 


ccattcatgt 


tgttgctcag 


gtcgcagacg 


ttttgcagca 


5400 


gcagtcgctt 


cacgttcgct 


cgcgtatcgg 


tgattcattc 


tgctaaccag 


taaggcaacc 


5460 


ccgccagcct 


agccgggtcc 


tcaacgacag 


gagcacgatc 


atgcgcaccc 


gtggccagga 


5520 


cccaacgctg 


cccgagatgc 


gccgcgtgcg 


gctgctggag 


atggcggacg 


cgatggatat 


5580 


gttctgccaa 


gggttggttt 


gcgcattcac 


agttctccgc 


aagaattgat 


tggctccaat 


5640 


tcttggagtg 


gtgaatccgt 


tagcgaggtg 


ccgccggctt 


ccattcaggt 


cgaggtggcc 


5700 


cggctccatg 


caccgcgacg 


caacgcgggg 


aggcagacaa 


ggtatagggc 


ggcgcctaca 


5760 


atccatgcca 


acccgttcca 


tgtgctcgcc 


gaggcggcat 


aaatcgccgt 


gacgatcagc 


5820 


ggtccagtga 


tcgaagttag 


gctggtaaga 


gccgcgagcg 


atccttgaag 


ctgtccctga 


5880 


tggtcgtcat 


ctacctgcct 


ggacagcatg 


gcctgcaacg 


cgggcatccc 


gatgccgccg 


594 0 


gaagcgagaa 


gaatcataat 


ggggaaggcc 


atccagcctc 


gcgtcgcgaa 


cgccagcaag 


6000 


acgtagccca 


gcgcgtcggc 


cgccatgccg 


gcgataatgg 


cctgcttctc 


gccgaaacgt 


6060 


ttggtggcgg 


gaccagtgac 


gaaggcttga 


gcgagggcgt 


gcaagattcc 


gaataccgca 


6120 


agcgacaggc 


cgatcatcgt 


cgcgctccag 


cgaaagcggt 


cctcgccgaa 


aatgacccag 


6180 


agcgctgccg 


gcacctgtcc 


tacgagttgc 


atgataaaga 


agacagtcat 


aagtgcggcg 


6240 


acgatagtca 


tgccccgcgc 


ccaccggaag 


gagctgactg 


ggttgaaggc 


tctcaagggc 


6300 


atcggtcgag 


atcccggtgc 


ctaatgagtg 


agctaactta 


cattaattgc 


gttgcgctca 


6360 


ctgcccgctt 


tccagtcggg 


aaacctgtcg 


tgccagctgc 


attaatgaat 


cggccaacgc 


6420 
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gcggggagag 


gcggtx t.g eg 


tatxgggcgc 


cagggegg tt 




L- l. cty cty civ^ 


0 *± 0 \j 


gggcaacagc 


cgaccgcccc 


ccaccgcct.g 


yccccya.yciy 


ciy L. Lay La 


a n f* r\ft trrar 
J JJ LLLat 




gctggtttgc 


cccagcaggc 


gaaaatcctg 


u l. ugaeggug 


gctddcyycy 


y y ci t ct t cici c d 


0 0 \J \j 


tgagctgtct 


teggtategt 


cgtatcccac 


caccgagaca 


CCCy CdCCdd 


cy eyedy ccc 


O D D U 


ggactcggta 


atggcgcgca 


ttgcgcccag 


cgccat.ct.ga 


ccgccyyccici 


ccdy edecy c 


D / U 


agtgggaacg 


acgcccccac 


c cage a ex eg 


cat.ggt.cuyt. 


f" f"T ^ 3 ^ ^ Z^ 1 /~1 fl 

tyaaaaCCgy 


a a t~ ft ft r*> a f~ 
dCdtyyCdC t 


67R0 


ccagtcgcct 


tcccgttccg 


c t.at.cggct.g 


ddtCLyatty 


cy cty c y cty a. t 


dLLLdLyttd 


D O *± \J 


gccagccaga 


cgcagacgcg 


ccgagacaga 


acccaacyyy 


CCCyCCctaCd 


yLyLyaLLty 


6900 
0 j \j \j 


ctggtgaccc 


aatgegacca 


gacgccccac 


gcccagccgc 


gtocegtet l 


<"» a rr/~T/~r ana a 

cdtyyy dy dd 


6 Q60 


aataatactg 


ttgatgggtg 


tctggtcaga 


gacatcaaga 


aacaacgccg 


gaa cat. cage 


7 OO O 

/ u z u 


gcaggcagct 


tccacagcaa 


tggcatcctg 


gtcatccagc 


ggatagttaa 


cgaccagccc 


/UOU 


actgacgcgt 


tgcgcgagaa 


gattgtgcac 


cgccgcttta 


caggcttcga 


cgccgcttcg 


/ ±*± U 


ttctaccatc 


gacaccacca 


cgctggcacc 


cagttgatcg 


gegegagatt 


t aatcgccgc 


no Art 


gacaatttgc 


gaeggegegt 


geagggecag 


actggaggtg 


gcaacgccaa 


ccagcaacga 


"7 *3 ^ O 
/ Z D U 


ctgtttgccc 


gccagtxgtx 


gtgccacgcg 


gttgggaatg 


caat ccagc c 


ccgccaccgc 


/ JZU 


cgcttccact 


ttttcccgcg 


ttttegcaga 


aacgtggctg 


gcctggttca 


ccacgcggga 


7380 


aacggtctga 


taagagacac 


cggcatactc 


tgegacateg 


tataaegtta 


ctggtttcac 


7440 


attcaccacc 


ctgaattgac 


tctcttccgg 


gegctatcat 


gccataccgc 


gaaaggtttt 


7500 


gcgccattcg 


atggtgtccg 


ggatctcgac 


gctctccctt 


atgcgactcc 


cgcac tagg a 


7 ^6 O 


agcagcccag 


tagtaggttg 


aggccgttga 


gcaccgccgc 


cgcaaggaat 


ggtgcatg 


7618 


<210> 2 
<211> 5934 
<212> DNA 
<213> Artificial 












<220> 

<22 3> pET104/D-TOPO 












<4 00> 2 

caaggagatg gcgcccaaca 


gtcccccggc 


cacggygcc t. 


y cccxccci colc 


ripa t-*ci(~* r**na a 
LLctLyLLyaa 


60 


acaagcgctc 


atgagecega 


agtggcgagc 


ccgatcttcc 


ccatcggtga 


tgteggegat 


120 


ataggcgcca 


gcaaccgcac 


ctgtggcgcc 


ggtgatgccg 


gccacgatgc 


gtccggcgta 


180 


gaggatcgag 


atctcgatcc 


cgegaaatta 


atacgactca 


ctatagggga 


attgtgagcg 


240 


gataacaatt 


cccctctaga 


aataattttg 


tttaacttta 


agaaggagat 


atacatatgg 


300 


gcgccggcac 


cccggtgacc 


gccccgctgg 


egggcactat 


ctggaaggtg 


ctggccagcg 


360 
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aaggccagac 


ggtggccgca 


ggcgaggtgc 


tgctgattct 


ggaagccatg 


•aagatggaaa 


420 


ccgaaatccg 


cgccgcgcag 


gccgggaccg 


tgcgcggtat 


cgcggtgaaa 


gccggcgacg 


480 


cggtggcggt 


cggcgacacc 


ctgatgaccc 


tggcgggctc 


tggatccgat 


ctgtacgacg' 


540 


atgacgataa 


gggaattgat 


cccttcacca 


agggcgagct 


cagatccggc 


tgctaacaaa 


600 


gcccgaaagg 


aagctgagtt 


ggctgctgcc 


accgctgagc 


aataactagc 


ataacccctt 


660 


ggggcctcta 


aacgggtctt 


gaggggtttt 


ttgctgaaag 


gaggaactat 


atccggatat 


720 


cccgcaagag 


gcccggcagt 


accggcataa 


ccaagcctat 


gcctacagca 


tccagggtga 


780 


cggtgccgag 


gatgacgatg 


agcgcattgt 


tagatttcat 


acacggtgcc 


tgactgcgtt 


840 


agcaatttaa 


ctgtgataaa 


ctaccgcatt 


aaagctagct 


tatcgatgat 


aagctgtcaa 


900 


acatgagaat 


taattcttga 


agacgaaagg 


gcctcgtgat 


acgcctattt 


ttataggtta 


960 


atgtcatgat 


aataatggtt 


tcttagacgt 


caggtggcac 


ttttcgggga 


aatgtgcgcg 


1020 


gaacccctat 


ttgtttattt 


ttctaaatac 


attcaaatat 


gtatccgctc 


atgagacaat 


1080 


aaccctgata 


aatgcttcaa 


taatattgaa 


aaaggaagag 


tatgagtatt 


caacatttcc 


1140 


gtgtcgccct 


tattcccttt 


tttgcggcat 


tttgccttcc 


tgtttttgct 


cacccagaaa 


1200 


cgctggtgaa 


agtaaaagat 


gctgaagatc 


agttgggtgc 


acgagtgggt 


tacatcgaac 


1260 


tggatctcaa 


cagcggtaag 


atccttgaga 


gttttcgccc 


cgaagaacgt 


tttccaatga 


1320 


tgagcacttt 


taaagttctg 


ctatgtggcg 


cggtattatc 


ccgtgttgac 


gccgggcaag 


1380 


agcaactcgg 


tcgccgcata 


cactattctc 


agaatgactt 


ggttgagtac 


tcaccagtca 


1440 


cagaaaagca 


tcttacggat 


ggcatgacag 


taagagaatt 


atgcagtgct 


gccataacca 


1500 


tgagtgataa 


cactgcggcc 


aacttacttc 


tgacaacgat 


cggaggaccg 


aaggagctaa 


1560 


ccgctttttt 


gcacaacatg 


ggggatcatg 


taactcgcct 


tgatcgttgg 


gaaccggagc 


1620 


tgaatgaagc 


cataccaaac 


gacgagcgtg 


acaccacgat 


gcctgcagca 


atggcaacaa 


1680 


cgttgcgcaa 


actattaact 


ggcgaactac 


ttactctagc 


ttcccggcaa 


caattaatag 


1740 


actggatgga 


ggcggataaa 


gttgcaggac 


cacttctgcg 


ctcggccctt 


ccggctggct 


1800 


ggtttattgc 


tgataaatct 


ggagccggtg 


agcgtgggtc 


tcgcggtatc 


attgcagcac 


1860 


tggggccaga 


tggtaagccc 


tcccgtatcg 


tagttatcta 


cacgacgggg 


agtcaggcaa 


1920 


ctatggatga 


acgaaataga 


cagatcgctg 


agataggtgc 


ctcactgatt 


aagcattggt 


1980 


aactgtcaga 


ccaagtttac 


tcatatatac 


tttagattga 


tttaaaactt 


catttttaat 


2040 


ttaaaaggat 


ctaggtgaag 


atcctttttg 


ataatctcat 


gaccaaaatc 


ccttaacgtg 


2100 


agttttcgtt 


ccactgagcg 


tcagaccccg 


tagaaaagat 


caaaggatct 


tcttgagatc 


2160 


ctttttttct 


gcgcgtaatc 


tgctgcttgc 


aaacaaaaaa 


accaccgcta 


ccagcggtgg 


2220 
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tttgtttgcc 


ggatcaagag 


ctaccaactc 


tttttccgaa 


ggtaactggc 


ttcagcagag 


2280 


cgcagatacc 


aaatactgtc 


cttctagtgt 


agccgtagtt 


aggccaccac 


ttcaagaact 


2340 


ctgtagcacc 


gcctacatac 


ctcgctctgc 


taatcctgtt 


accagtggct 


gctgccagtg 


2400 


gcgataagtc 


gtgtcttacc 


gggttggact 


caagacgata 


gttaccggat 


aaggcgcagc 


2460 


ggtcgggctg 


aacggggggt 


tcgtgcacac 


agcccagctt 


ggagcgaacg 


acctacaccg 


2520 


aactgagata 


cctacagcgt 


gagctatgag 


aaagcgccac 


gcttcccgaa 


gggagaaagg 


2580 


cggacaggta 


tccggtaagc 


ggcagggtcg 


gaacaggaga 


gcgcacgagg 


gagcttccag > 


2640 


ggggaaacgc 


ctggtatctt 


tatagtcctg 


tcgggtttcg 


ccacctctga 


cttgagcgtc 


2700 


gatttttgtg 


atgctcgtca 


ggggggcgga 


gcctatggaa 


aaacgccagc 


aacgcggcct 


2760 


ttttacggtt 


cctggccttt 


tgctggcctt 


ttgctcacat 


gttctttcct 


gcgttatccc 


2820 


ctgattctgt 


ggataaccgt 


attaccgcct 


ttgagtgagc 


tgataccgct 


cgccgcagcc 


2880 


gaacgaccga 


gcgcagcgag 


tcagtgagcg 


aggaagcgga 


agagcgcctg 


atgcggtatt 


2940 


ttctccttac 


gcatctgtgc 


ggtatttcac 


accgcatata 


tggtgcactc 


tcagtacaat 


3000 


ctgctctgat 


gccgcatagt 


taagccagta 


tacactccgc 


tatcgctacg 


tgactgggtc 


3060 


atggctgcgc 


cccgacaccc 


gccaacaccc 


gctgacgcgc 


cctgacgggc 


ttgtctgctc 


3120 


ccggcatccg 


cttacagaca 


agctgtgacc 


gtctccggga 


gctgcatgtg 


tcagaggttt 


3180 


tcaccgtcat 


caccgaaacg 


cgcgaggcag 


ctgcggtaaa 


gctcatcagc 


gtggtcgtga 


3240 


agcgattcac 


agatgtctgc 


ctgttcatcc 


gcgtccagct 


cgttgagttt 


ctccagaagc 


3300 


gttaatgtct 


ggcttctgat 


aaagcgggcc 


atgttaaggg 


cggttttttc 


ctgtttggtc 


3360 


actgatgcct 


ccgtgtaagg 


gggatttctg 


ttcatggggg 


taatgatacc 


gatgaaacga 


3420 


gagaggatgc 


tcacgatacg 


ggttactgat 


gatgaacatg 


cccggttact 


ggaacgttgt 


3480 


gagggtaaac 


aactggcggt 


atggatgcgg 


cgggaccaga 


gaaaaatcac 


tcagggtcaa 


3540 


tgccagcgct 


tcgttaatac 


agatgtaggt 


gttccacagg 


gtagccagca 


gcatcctgcg 


3600 


atgcagatcc 


ggaacataat 


ggtgcagggc 


gctgacttcc 


gcgtttccag 


actttacgaa 


3660 


acacggaaac 


cgaagaccat 


tcatgttgtt 


gctcaggtcg 


cagacgtttt 


gcagcagcag 


3720 


tcgcttcacg 


ttcgctcgcg 


tatcggtgat 


tcattctgct 


aaccagtaag 


gcaaccccgc 


3780 


cagcctagcc 


gggtcctcaa 


cgacaggag.c 


acgatcatgc 


gcacccgtgg 


ccaggaccca 


3840 


acgctgcccg 


agatgcgccg 


cgtgcggctg 


ctggagatgg 


cggacgcgat 


ggatatgttc 


3900 


tgccaagggt 


tggtttgcgc 


attcacagtt 


ctccgcaaga 


attgattggc 


tccaattctt 


3960 


ggagtggtga 


atccgttagc 


gaggtgccgc 


cggcttccat 


tcaggtcgag 


gtggcccggc 


4020 


tccatgcacc 


gcgacgcaac 


gcggggaggc 


agacaaggta 


tagggcggcg 


cctacaatcc 


4080 
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atgccaaccc 


gttccatgtg 


ctcgccgagg 


cggcataaat 


cgccgtgacg 


atcagcggtc 


4140 


cagtgatcga 


agttaggctg 


gtaagagccg 


cgagcgatcc 


ttgaagctgt 


ccctgatggt 


4200 


cgtcatctac 


ctgcctggac 


agcatggcct 


gcaacgcggg 


catcccgatg 


ccgccggaag 


4260 


cgagaagaat 


cataatgggg 


aaggccatcc 


agcctcgcgt 


cgcgaacgcc 


agcaagacgt 


4320 


agcccagcgc 


gtcggccgcc 


atgccggcga 


taatggcctg 


cttctcgccg 


aaacgtttgg 


4380 


tggcgggacc 


agtgacgaag 


gcttgagcga 


gggcgtgcaa 


gattccgaat 


accgcaagcg 


4440 


acaggccgat 


catcgtcgcg 


ctccagcgaa 


agcggtcctc 


gccgaaaatg 


acccagagcg 


4500 


ctgccggcac 


ctgtcctacg 


agttgcatga 


taaagaagac 


agtcataagt 


gcggcgacga 


4560 


tagtcatgcc 


ccgcgcccac 


cggaaggagc 


tgactgggtt 


gaaggctctc 


aagggcatcg 


4620 


gtcgagatcc 


cggtgcctaa 


tgagtgagct 


aacttacatt 


aattgcgttg 


cgctcactgc 


4680 


ccgctttcca 


gtcgggaaac 


ctgtcgtgcc 


agctgcatta 


atgaatcggc 


caacgcgcgg 


4740 


ggagaggcgg 


tttgcgtatt 


gggcgccagg 


gtggtttttc 


ttttcaccag 


tgagacgggc 


4800 


aacagctgat 


tgcccttcac 


cgcctggccc 


tgagagagtt 


gcagcaagcg 


gtccacgctg 


4860 


gtttgcccca 


gcaggcgaaa 


atcctgtttg 


atggtggtta 


acggcgggat 


ataacatgag 


4920 


ctgtcttcgg 


tatcgtcgta 


tcccactacc 


gagatatccg 


caccaacgcg 


cagcccggac 


4980 


tcggtaatgg 


cgcgcattgc 


gcccagcgcc 


atctgatcgt 


tggcaaccag 


catcgcagtg 


5040 


ggaacgatgc 


cctcattcag 


catttgcatg 


gtttgttgaa 


aaccggacat 


ggcactccag 


5100 


tcgccttccc 


gttccgctat 


cggctgaatt 


tgattgcgag 


tgagatattt 


atgccagcca 


5160 


gccagacgca 


gacgcgccga 


gacagaactt 


aatgggcccg 


ctaacagcgc 


gatttgctgg 


5220 


tgacccaatg 


cgaccagatg 


ctccacgccc 


agtcgcgtac 


cgtcttcatg 


ggagaaaata 


5280 


atactgttga 


tgggtgtctg 


gtcagagaca 


tcaagaaata 


acgccggaac 


attagtgcag 


5340 


gcagcttcca 


cagcaatggc 


atcctggtca 


tccagcggat 


agttaatgat 


cagcccactg 


5400 


acgcgttgcg 


cgagaagatt 


gtgcaccgcc 


gctttacagg 


cttcgacgcc 


gcttcgttct 


5460 


accatcgaca 


ccaccacgct 


ggcacccagt 


tgatcggcgc 


gagatttaat 


cgccgcgaca 


5520 


atttgcgacg 


gcgcgtgcag 


ggccagactg 


gaggtggcaa 


cgccaatcag 


caacgactgt 


5580 


ttgcccgcca 


gttgttgtgc 


cacgcggttg 


ggaatgtaat 


tcagctccgc 


catcgccgct 


5640 


tccacttttt 


cccgcgtttt 


cgcagaaacg 


tggctggcct 


ggttcaccac 


gcgggaaacg 


5700 


gtctgataag 


agacaccggc 


atactctgcg 


acatcgtata 


acgttactgg 


tttcacattc 


5760 


accaccctga 


attgactctc 


ttccgggcgc 


tatcatgcca 


taccgcgaaa 


ggtbttgcgc 


5820 


cattcgatgg 


tgtccgggat 


ctcgacgctc 


tcccttatgc 


gactcctgca 


ttaggaagca 


5880 


gcccagtagt 


aggttgaggc 


cgttgagcac 


cgccgccgca 


aggaatggtg 


catg 


5934 
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<210> 3 

<211> 6959 

<212> DNA 

<213> Artificial 

<220> 

<223> pcDNA/Biotag-DEST ~ 

<400> 3 



gacggatcgg 


gagatctccc 


gatcccctat 


ggtcgactct 


cagtacaatc 


tgctctgatg 


60 


ccgcatagtt 


aagccagtat 


ctgctccctg 


cttgtgtgtt 


ggaggtcgct 


gagtagtgcg 


120 


cgagcaaaat 


ttaagctaca 


acaaggcaag 


gcttgaccga 


caattgcatg 


aagaatctgc 


180 


ttagggttag 


gcgttttgcg 


ctgcttcgcg 


atgtacgggc 


cagatatacg 


cgttgacatt 


240 


gattattgac 


tagttattaa 


tagtaatcaa 


ttacggggtc 


attagttcat 


agcccatata 


300 


tggagttccg 


cgttacataa 


cttacggtaa 


atggcccgcc 


tggctgaccg 


cccaacgacc 


360 


cccgcccatt 


gacgtcaata 


atgacgtatg 


ttcccatagt 


aacgccaata 


gggactttcc 


420 


attgacgtca 


atgggtggac 


tatttacggt 


aaactgccca 


cttggcagta 


catcaagtgt 


480 


atcatatgcc 


aagtacgccc 


cctattgacg 


tcaatgacgg 


taaatggccc 


gcctggcatt 


540 


atgcccagta 


catgacctta 


tgggactttc 


ctacttggca 


gtacatctac 


gtattagtca 


600 


tcgctattac 


catggtgatg 


cggttttggc 


agtacatcaa 


tgggcgtgga 


tagcggtttg 


660 


actcacgggg 


atttccaagt 


ctccacccca 


ttgacgtcaa 


tgggagtttg 


ttttggcacc 


720 


aaaatcaacg 


ggactttcca 


aaatgtcgta 


acaactccgc 


cccattgacg 


caaatgggcg 


780 


gtaggcgtgt 


acggtgggag 


gtctatataa 


gcagagctct 


ctggctaact 


agagaaccca 


840 


ctgcttactg 


gcttatcgaa 


attaatacga 


ctcactatag 


ggagacccaa 


gctggctagc 


900 


gtttaaactt 


aagcttacca 


tgggcgccgg 


caccccggtg 


accgccccgc 


tggcgggcac 


960 


tatctggaag 


gtgctggcca 


gcgaaggcca 


gacggtggcc 


gcaggcgagg 


tgctgctgat 


1020 


tctggaagcc 


atgaagatgg 


aaaccgaaat 


ccgcgccgcg 


caggccggga 


ccgtgcgcgg 


1080 


tatcgcggtg 


aaagccggcg 


acgcggtggc 


ggtcggcgac 


accctgatga 


ccctggcggg 


1140 


ctctggatcc 


gatctgtacg 


acgatgacga 


taaggtacat 


caaacaagtt 


tgtacaaaaa 


1200 


agctgaacga 


gaaacgtaaa 


atgatataaa 


tatcaatata 


ttaaattaga 


ttttgcataa 


1260 


aaaacagact 


acataatact 


gtaaaacaca 


acatatccag 


tcactatggc 


ggccgcatta 


1320 


ggcaccccag 


gctttacact 


ttatgcttcc 


ggctcgtata 


atgtgtggat 


tttgagttag 


1380 


gatccggcga 


gattttcagg 


agctaaggaa 


gctaaaatgg 


agaaaaaaat 


cactggatat 


1440 


accaccgttg 


atatatccca 


atggcatcgt 


aaagaacatt 


ttgaggcatt 


tcagtcagtt 


1500 


gctcaatgta 


cctataacca 


gaccgttcag 


ctggatatta 


cggccttttt 


aaagaccgta 


1560 


aagaaaaata 


agcacaagtt 


ttatccggcc 


tttattcaca 


ttcttgcccg 


cctgatgaat 


1620 
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gctcatccgg 


aattccgtat 


ggcaatgaaa 


gacggtgagc 


tggtgatatg 


ggatagtgtt 


1680 


cacccttgtt 


acaccgtttt 


ccatgagcaa 


actgaaacgt 


tttcatcgct 


ctggagtgaa 


1740 


taccacgacg 


atttccggca 


gtttctacac 


atatattcgc 


aagatgtggc 


gtgttacggt 


1800 


gaaaacctgg 


cctatttccc 


taaagggttt 


attgagaata 


tgtttttcgt 


ctcagccaat 


1860 


ccctgggtga 


gtttcaccag 


ttttgattta 


aacgtggcca 


atatggacaa 


cttcttcgcc 


1920 


cccgttttca 


ccatgggcaa 


atattatacg 


caaggcgaca 


aggtgctgat 


gccgctggcg 


1980 


attcaggttc 


atcatgccgt 


ctgtgatggc 


ttccatgtcg 


gcagaatgct 


taatgaatta 


2040 


caacagtact 


gcgatgagtg 


gcagggcggg 


gcgtaaacgc 


gtggatccgg 


cttactaaaa 


2100 


gccagataac 


agtatgcgta 


tttgcgcgct 


cgcgaaccgg 


tgtatacccg 


aagtatgtca 


2160 


aaaagaggtg 


tgctatgaag 


cagcgtatta 


cagtgacagt 


tgacagcgac 


agctatcagt 


2220 


tgctcaaggc 


atatatgatg 


tcaatatctc 


cggtctggta 


agcacaacca 


tgcagaatga 


2280 


agcccgtcgt 


ctgcgtgccg 


aacgctggaa 


agcggaaaat 


caggaaggga 


tggctgaggt 


2340 


cgcccggttt 


attgaaatga 


acggctcttt 


tgctgacgag 


aacagggact 


ggtgaaatgc 


2400 


agtttaaggt 


ttacacctat 


aaaagagaga 


gccgttatcg 


tctgtttgtg 


gatgtacaga 


2460 


gtgatattat 


tgacacgccc 


gggcgacgga 


tggtgatccc 


cctggccagt 


gcacgtctgc 


2520 


tgtcagataa 


agtctcccgt 


gaactttacc 


cggtggtgca 


tatcggggat 


gaaagctggc 


2580 


gcatgatgac 


caccgatatg 


gccagtgtgc 


cggtctccgt 


tatcggggaa 


gaagtggctg 


2640 


atctcagcca 


ccgcgaaaat 


gacatcaaaa 


acgccattaa 


cctgatgttc 


tggggaatat 


2700 


aaatgtcagg 


ctccgttata 


cacagccagt 


ctgcaggtcg 


accatagtga 


ctggatatgt 


2760 


tgtgttttac 


agtattatgt 


agtctgtttt 


ttatgcaaaa 


tctaatttaa 


tatattgata 


2820 


tttatatcat 


tttacgtttc 


tcgttcagct 


ttcttgtaca 


aagtggtgat 


aattaattaa 


2880 


gatctagagg 


gcccgtttaa 


acccgctgat 


cagcctcgac 


tgtgccttct 


agttgccagc 


2940 


catctgttgt 


ttgcccctcc 


cccgtgcctt 


ccttgaccct 


ggaaggtgcc 


actcccactg 


3000 


tcctttccta 


ataaaatgag 


gaaattgcat 


cgcattgtct 


gagtaggtgt 


cattctattc 


3060 


tggggggtgg 


ggtggggcag 


gacagcaagg 


gggaggattg 


ggaagacaat 


agcaggcatg 


3120 


ctggggatgc 


ggtgggctct 


atggcttctg 


aggcggaaag 


aaccagctgg 


ggctctaggg 


3180 


ggtatcccca 


cgcgccctgt 


agcggcgcat 


taagcgcggc 


gggtgtggtg 


gttacgcgca 


3240 


gcgtgaccgc 


tacacttgcc 


agcgccctag 


cgcccgctcc 


tttcgctttc 


ttcccttcct 


3300 


ttctcgccac 


gttcgccggc 


tttccccgtc 


aagctctaaa 


tcggggcatc 


cctttagggt 


3360 


tccgatttag 


tgctttacgg 


cacctcgacc 


ccaaaaaact 


tgattagggt 


gatggttcac 


3420 


gtagtgggcc 


atcgccctga 


tagacggttt 


ttcgcccttt 


gacgttggag 


tccacgttct 


3480 



-11- 



ttaatagtgg 


actcttgttc 


caaactggaa 


caacactcaa 


ccctatctcg 


gtctattctt 


3540 


ttgatttata 


agggattttg 


gggatttcgg 


cctattggtt 


aaaaaatgag 


ctgatttaac 


3600 


aaaaatttaa 


cgcgaattaa 


ttctgtggaa 


tgtgtgtcag 


ttagggtgtg 


gaaagtcccc 


3660 


aggctcccca 


ggcaggcaga 


agtatgcaaa 


gcatgcatct 


caattagtca 


gcaaccaggt 


3720 


gtggaaagtc 


cccaggctcc 


ccagcaggca 


gaagtatgca 


aagcatgcat 


ctcaattagt 


3780 


cagcaaccat 


agtcccgccc 


ctaactccgc 


ccatcccgcc 


cctaactccg 


cccagttccg 


3840 


cccattctcc 


gccccatggc 


tgactaattt 


tttttattta 


tgcagaggcc 


gaggccgcct 


3900 


ctgcctctga 


gctattccag 


aagtagtgag 


gaggcttttt 


tggaggccta 


ggcttttgca 


3960 


aaaagctccc 


gggagcttgt 


atatccattt 


tcggatctga 


tcagcacgtg 


ttgacaatta 


4020 


atcatcggca 


tagtatatcg 


gcatagtata 


atacgacaag 


gtgaggaact 


aaaccatggc 


4080 


caagcctttg 


tctcaagaag 


aatccaccct 


cattgaaaga 


gcaacggcta 


caatcaacag 


4140 


catccccatc 


tctgaagact 


acagcgtcgc 


cagcgcagct 


ctctctagcg 


acggccgcat 


4200 


cttcactggt 


gtcaatgtat 


atcattttac 


tgggggacct 


tgtgcagaac 


tcgtggtgct 


4260 


gggcactgct 


gctgctgcgg 


cagctggcaa 


cctgacttgt 


atcgtcgcga 


tcggaaatga 


4320 


gaacaggggc 


atcttgagcc 


cctgcggacg 


gtgccgacag 


gtgcttctcg 


atctgcatcc 


4380 


tgggatcaaa 


gccatagtga 


aggacagtga 


tggacagccg 


acggcagttg 


ggattcgtga 


4440 


attgctgccc 


tctggttatg 


tgtgggaggg 


ctaagcactt 


cgtggccgag 


gagcaggact 


4500 


gacacgtgct 


acgagatttc 


gattccaccg 


ccgccttcta 


tgaaaggttg 


ggcttcggaa 


4560 


tcgttttccg 


ggacgccggc 


tggatgatcc 


tccagcgcgg 


ggatctcatg 


ctggagttct 


4620 


tcgcccaccc 


caacttgttt 


attgcagctt 


ataatggtta 


caaataaagc 


aatagcatca 


4680 


caaatttcac 


aaataaagca 


tttttttcac 


tgcattctag 


ttgtggtttg 


tccaaactca 


4740 


tcaatgtatc 


ttatcatgtc 


tgtataccgt 


cgacctctag 


ctagagcttg 


gcgtaatcat 


4800 


ggtcatagct 


gtttcctgtg 


tgaaattgtt 


atccgctcac 


aattccacac 


aacatacgag 


4 860 


ccggaagcat 


aaagtgtaaa 


gcctggggtg 


cctaatgagt 


gagctaactc 


acattaattg 


4920 


cgttgcgctc 


actgcccgct 


ttccagtcgg 


gaaacctgtc 


gtgccagctg 


cattaatgaa 


4980 


tcggccaacg 


cgcggggaga 


ggcggtttgc 


gtattgggcg 


ctcttccgct 


tcctcgctca 


5040 


ctgactcgct 


gcgctcggtc 


gttcggctgc 


ggcgagcggt 


atcagctcac 


tcaaaggcgg 


5100 


taatacggtt 


atccacagaa 


tcaggggata 


acgcaggaaa 


gaacatgtga 


gcaaaaggcc 


5160 


agcaaaaggc 


caggaaccgt 


aaaaaggccg 


cgttgctggc 


gtttttccat 


aggctccgcc 


5220 


cccctgacga 


gcatcacaaa 


aatcgacgct 


caagtcagag 


gtggcgaaac 


ccgacaggac 


5280 


tataaagata 


ccaggcgttt 


ccccctggaa 


gctccctcgt 


gcgctctcct 


gttccgaccc 


5340 
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tgccgcttac 


cggatacctg 


tccgcctttc 


tcccttcggg 


aagcgtggcg 


ctttctcaat 


5400 


gctcacgctg 


taggtatctc 


agttcggtgt 


aggtcgttcg 


ctccaagctg 


ggctgtgtgc 


5460 


acgaaccccc 


cgttcagccc 


gaccgctgcg 


ccttatccgg 


taactatcgt 


cttgagtcca 


.5520 


acccggtaag 


acacgactta 


tcgccactgg 


cagcagccac 


tggtaacagg 


attagcagag 


5580 


cgaggtatgt 


aggcggtgct 


acagagttct 


tgaagtggtg 


gcctaactac 


ggctacacta 


5640 


gaaggacagt 


atttggtatc 


tgcgctctgc 


tgaagccagt 


taccttcgga 


aaaagagttg 


5700 


gtagctcttg 


atccggcaaa 


caaaccaccg 


ctggtagcgg 


tggttttttt 


gtttgcaagc 


5760 


agcagattac 


gcgcagaaaa 


aaaggatctc 


aagaagatcc 


tttgatcttt 


tctacggggt 


5820 


ctgacgctca 


gtggaacgaa 


aactcacgtt 


aagggatttt 


ggtcatgaga 


ttatcaaaaa 


5880 


ggatcttcac 


ctagatcctt 


ttaaattaaa 


aatgaagttt 


taaatcaatc 


taaagtatat 


5940 


atgagtaaac 


ttggtctgac 


agttaccaat 


gcttaatcag 


tgaggcacct 


atctcagcga 


6000 


tctgtctatt 


tcgttcatcc 


atagttgcct 


gactccccgt 


cgtgtagata 


actacgatac 


6060 


gggagggctt 


accatctggc 


cccagtgctg 


caatgatacc 


gcgagaccca 


cgctcaccgg 


6120 


ctccagattt 


atcagcaata 


aaccagccag 


ccggaagggc 


cgagcgcaga 


agtggtcctg 


6180 


caactttatc 


cgcctccatc 


cagtctatta 


attgttgccg 


ggaagctaga 


gtaagtagtt 


6240 


cgccagttaa 


tagtttgcgc 


aacgttgttg 


ccattgctac 


aggcatcgtg 


gtgtcacgct 


6300 


cgtcgtttgg 


tatggcttqa 


ttcagctccg 


gttcccaacg 


atcaaggcga 


gttacatgat 


6360 


cccccatgtt 


gtgcaaaaaa 


gcggttagct 


ccttcggtcc 


tccgatcgtt 


gtcagaagta 


6420 


agttggccgc 


agtgttatca 


ctcatggtta 


tggcagcact 


gcataattct 


cttactgtca 


6480 


tgccatccgt 


aagatgcttt 


tctgtgactg 


gtgagtactc 


aaccaagtca 


ttctgagaat 


6540 


agtgtatgcg 


gcgaccgagt 


tgctcttgcc 


cggcgtcaat 


acgggataat 


accgcgccac 


6600 


atagcagaac 


tttaaaagtg 


ctcatcattg 


gaaaacgttc 


ttcggggcga 


aaactctcaa 


6660 


ggatcttacc 


gctgttgaga 


tccagttcga 


tgtaacccac 


tcgtgcaccc 


aactgatctt 


6720 


cagcatcttt 


tactttcacc 


agcgtttctg 


ggtgagcaaa 


aacaggaagg 


caaaatgccg 


6780 


caaaaaaggg 


aataagggcg 


acacggaaat 


gttgaatact 


catactcttc 


ctttttcaat 


6840 


attattgaag 


catttatcag 


ggttattgtc 


tcatgagcgg 


atacatattt 


gaatgtattt 


6900 


agaaaaataa 


acaaataggg 


gttccgcgca 


catttccccg 


aaaagtgcca 


cctgacgtc 


6959 
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<400> 4 

gacggatcgg gagatctccc 


gatcccctat 


ggtcgactct 


cagtacaatc 


tgctctgatg 


60 


ccgcatagtt aagccagtat 


ctgctccctg 


cttgtgtgtt 


ggaggtcgct 


gagtagtgcg 


120 


cgagcaaaat ttaagctaca 


acaaggcaag 


gcttgaccga 


caattgcatg 


aagaatctgc 


180 


ttagggttag gcgttttgcg 


ctgcttcgcg 


atgtacgggc 


cagatatacg 


cgttgacatt 


240 


gattattgac tagttattaa 


tagtaatcaa 


ttacggggtc 


attagttcat 


agcccatata 


300 


tggagttccg cgttacataa 


cttacggtaa 


atggcccgcc 


tggctgaccg 


cccaacgacc 


360 


cccgcccatt gacgtcaata 


atgacgtatg 


ttcccatagt 


aacgccaata 


gggactttcc 


420 


attgacgtca atgggtggac 


tatttacggt 


aaactgccca 


cttggcagta 


catcaagtgt 


480 


atcatatgcc aagtacgccc 


cctattgacg 


tcaatgacgg 


taaatggccc 


gcctggcatt 


540 


atgcccagta catgacctta 


tgggactttc 


ctacttggca 


gtacatctac 


gtattagtca 


600 


tcgctattac catggtgatg 


cggttttggc 


agtacatcaa 


tgggcgtgga 


tagcggtttg 


660 


actcacgggg atttccaagt 


ctccacccca 


ttgacgtcaa 


tgggagtttg 


ttttggcacc 


720 


aaaatcaacg ggactttcca 


aaatgtcgta 


acaactccgc 


cccattgacg 


caaatgggcg 


780 


gtaggcgtgt acggtgggag 


gtctatataa 


gcagagctct 


ctggctaact 


agagaaccca 


840 


ctgcttactg gcttatcgaa 


attaatacga 


ctcactatag 


ggagacccaa 


gctggctagc 


900 


gtttaaactt aagcttacca 


tgggcgccgg 


caccccggtg 


accgccccgc 


tggcgggcac 


960 


tatctggaag gtgctggcca 


gcgaaggcca 


gacggtggcc 


gcaggcgagg 


tgctgctgat 


1020 


tctggaagcc atgaagatgg 


aaaccgaaat 


ccgcgccgcg 


caggccggga 


ccgtgcgcgg 


1080 


tatcgcggtg aaagccggcg 


acgcggtggc 


ggtcggcgac 


accctgatga 


ccctggcggg 


1140 


ctctggatcc gatctgtacg 


acgatgacga 


taaggtacct 


aggatccagt 


gtggtggaat 


1200 


tgatcccttc accaagggcg 


tcgagtctag 


agggcccgtt 


taaacccgct 


gatcagcctc 


1260 


gactgtgcct tctagttgcc 


agccatctgt 


tgtttgcccc 


tcccccgtgc 


cttccttgac 


1320 


cctggaaggt gccactccca 


ctgtcctttc 


ctaataaaat 


gaggaaattg 


catcgcattg 


1380 


tctgagtagg tgtcattcta 


ttctgggggg 


tggggtgggg 


caggacagca 


agggggagga 


1440 


ttgggaagac aatagcaggc 


atgctgggga 


tgcggtgggc 


tctatggctt 


ctgaggcgga 


1500 


aagaaccagc tggggctcta 


gggggtatcc 


ccacgcgccc 


tgtagcggcg 


cattaagcgc 


1560 


ggcgggtgtg gtggttacgc 


gcagcgtgac 


cgctacactt 


gccagcgccc 


tagcgcccgc 


1620 


tcctttcgct ttcttccctt 


cctttctcgc 


cacgttcgcc 


ggctttcccc 


gtcaagctct 


1680 


aaatcggggc atccctttag 


ggttccgatt 


tagtgcttta 


cggcacctcg 


accccaaaaa 


1740 


acttgattag ggtgatggtt 


cacgtagtgg 


gccatcgccc 


tgatagacgg 


tttttcgccc 


1800 


tttgacgttg gagtccacgt 


tctttaatag 


tggactcttg 


ttccaaactg 


gaacaacact 


1860 
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caaccctatc 


tcggtctatt 


cttttgattt 


ataagggatt 


ttggggattt 


cggcctattg 


1920 


gttaaaaaat 


gagctgattt 


aacaaaaatt 


taacgcgaat 


taattctgtg 


gaatgtgtgt 


1980 


cagttagggt 


gtggaaagtc 


cccaggctcc 


ccaggcaggc 


agaagtatgc 


aaagcatgca 


2040 


tctcaattag 


tcagcaacca 


ggtgtggaaa 


gtccccaggc 


tccccagcag 


gcagaagtat 


2100 


gcaaagcatg 


catctcaatt 


agtcagcaac 


catagtcccg 


cccctaactc 


cgcccatccc 


2160 


gcccctaact 


ccgcccagtt 


ccgcccattc 


tccgccccat 


ggctgactaa 


ttttttttat 


2220 


ttatgcagag 


gccgaggccg 


cctctgcctc 


tgagctattc 


cagaagtagt 


gaggaggctt 


2280 


ttttggaggc 


ctaggctttt 


gcaaaaagct 


cccgggagct 


tgtatatcca 


ttttcggatc 


2340 


tgatcagcac 


gtgttgacaa 


ttaatcatcg gcatagtata 


tcggcatagt 


ataatacgac 


2400 


aaggtgagga 


actaaaccat 


ggccaagcct 


ttgtctcaag 


aagaatccac 


cctcattgaa 


2460 


agagcaacgg 


ctacaatcaa 


cagcatcccc 


atctctgaag 


actacagcgt 


cgccagcgca 


2520 


gctctctcta 


gcgacggccg 


catcttcact 


ggtgtcaatg 


tatatcattt 


tactggggga 


2580 


ccttgtgcag 


aactcgtggt 


gctgggcact 


gctgctgctg 


cggcagctgg 


caacctgact 


2640 


tgtatcgtcg 


cgatcggaaa 


tgagaacagg ggcatcttga 


gcccctgcgg 


acggtgccga 


2700 


caggtgcttc 


tcgatctgca 


tcctgggatc 


aaagccatag 


tgaaggacag 


tgatggacag 


2760 


ccgacggcag 


ttgggattcg 


tgaattgctg 


ccctctggtt 


atgtgtggga 


gggctaagca 


2820 


cttcgtggcc 


gaggagcagg 


actgacacgt gctacgagat 


ttcgattcca 


ccgccgcctt 


2880 


ctatgaaagg 


ttgggcttcg 


gaatcgtttt 


ccgggacgcc 


ggctggatga 


tcctccagcg 


2940 


cggggatctc 


atgctggagt 


tcttcgccca 


ccccaacttg 


tttattgcag 


cttataatgg 


3000 


ttacaaataa 


agcaatagca 


tcacaaattt 


cacaaataaa 


gcattttttt 


cactgcattc 


3060 


tagttgtggt 


ttgtccaaac 


tcatcaatgt 


atcttatcat 


gtctgtatac 


cgtcgacctc 


3120 


tagctagagc 


ttggcgtaat 


catggtcata 


gctgtttcct 


gtgtgaaatt 


gttatccgct 


3180 


cacaattcca 


cacaacatac 


gagccggaag 


cataaagtgt 


aaagcctggg 


gtgcctaatg 


3240 


agtgagctaa 


ctcacattaa 


ttgcgttgcg 


ctcactgccc 


gctttqpagt 


ccjggaaacct 


3300 


gtcgtgccag 


ctgcattaat 


gaatcggcca 


acgcgcgggg 


agaggcggtt 


tgcgtattgg 


3360 


gcgctcttcc 


gcttcctcgc 


tcactgactc 


gctgcgctcg 


gtcgttcggc 


tgcggcgagc 


3420 


ggtatcagct 


cactcaaagg 


cggtaatacg 


gttatccaca 


gaatcagggg 


ataacgcagg 


3480 


aaagaacatg 


tgagcaaaag 


gccagcaaaa 


ggccaggaac 


cgtaaaaagg 


ccgcgttgct 


3540 


ggcgtttttc 


cataggctcc 


gcccccctga 


cgagcatcac 


aaaaatcgac 


gctcaagtca 


3600 


gaggtggcga 


aacccgacag 


gactataaag 


ataccaggcg 


tttccccctg 


gaagctccct 


3660 


cgtgcgctct 


cctgttccga 


ccctgccgct' taccggatac 


ctgtccgcct 


ttctcccttc 


3720 
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gggaagcgtg gcgctttctc 


aatgctcacg 


ctgtaggtat 


etcagttegg 


tgtaggtcgt 


"5 H Q r\ 


tcgctccaag 


ctgggctgtg 


tgcacgaacc 


ccccgttcag 


cccgaccgct 


gcgccttatc 


Jo40 


cggtaactat cgtcttgagt 


ccaacccggt 


aagacacgac 


ttatcgccac 


tggcagcagc 


3 900 


cactggtaac aggattagca 


gagegaggta 


tgtaggcggt 


gctacagagt 


tcttgaagtg 


3 960 


gtggcctaac 


tacggctaca 


ctagaaggac 


agtatttggt 


atctgcgctc 


tgetgaagee 


4020 


agttaccttc 


ggaaaaagag 


ttggtagctc 


ttgatcegge 


aaacaaacca 


ccgctggtag 


4080 


cggtggtttt 


tttgtttgca 


agcagcagat 


tacgegcaga 


aaaaaaggat 


ctcaagaaga 


4140 


tcctttgatc 


ttttctaegg 


ggtctgaege 


tcagtggaac 


gaaaactcac 


gttaagggat 


42 00 


tttggtcatg agattatcaa 


aaaggatctt 


cacctagatc 


cttttaaatt 


aaaaatgaag 


4260 


L. l~ L. LdauLua 


a tctaaaata 

CL ^- ^ I— c*Q.ciy *— U 


tatatgagta 


aacttggtct 


gacagttacc 


aatgcttaat 


4320 


c 3 at - a s era c a 
Lay yy 


rctatctcaa 


cgatctgtct 


atttegttea 


tccatagttg 


cctgactccc 


4380 


tyt uy uay 


al*aarfa caa 

a ci c*. \— q Vj- *«h 


tacgggaggg 


cttaccatct 


ggccccagtg 


ctgeaatgat 


4440 




ccacgctcac 


cggctccaga 


tttatcagca 


ataaaccagc 


cagceggaag 


4500 


ggccgagcgc 


agaagtggtc 


ctgcaacttt 


atccgcctcc 


atccagtcta 


ttaattgttg 


4560 


cegggaaget 


agagtaagta 


gttcgccagt 


taatagtttg 


cgcaacgttg 


ttgccattgc 


4620 


tacaggcatc 


gtggtgtcac 


getegtegtt 


tggtatggct 


tcattcagct 


ccggttccca 


4680 


acgatcaagg 


cgagttacat 


gatcccccat 


gttgtgcaaa 


aaagcggtta 


gctccttcgg 


4740 


tcctccgatc 


gttgtcagaa 


gtaagttggc 


cgcagtgtta 


tcactcatgg 


ttatggcagc 




actgeataat 


tctcttactg 


tcatgccatc 


cgtaagatgc 


ttttctgtga 


ctggtgagta 


A Q c r\ 

4 ooU 


ctcaaccaag 


tcattctgag 


aatagtgtat 


gcggcgaccg 


agttgctctt 


gcccggcgtc 




aataegggat 


aataccgcgc 


cacatagcag 


aactttaaaa 


gtgetcatea 


ttggaaaacg 


A C\ O A 

4980 


ttcttcgggg 


cgaaaactct 


caaggatctt 


accgctgttg 


agatccagtt 


cgatgtaacc 


b04U 


cactcgtgca 


cccaactgat 


cttcagcatc 


ttttactttc 


accagcgttt 


ctgggtgagc 


5100 


aaaaacagga 


aggcaaaatg 


ccgcaaaaaa 


gggaataagg 


gcgacacgga 


aatgttgaat 


5160 


actcatactc 


ttcctttttc 


aatattattg 


aagcatttat 


cagggttatt 


gtctcatgag 


5220 


eggatacata 


tttgaatgta 


tttagaaaaa 


taaacaaata 


ggggttccgc 


gcacatttcc 


5280 


ccgaaaagtg 


ccacctgacg 


tc 








5302 



<210> 5 

<211> 5375 

<212> DNA 

<213> Artificial 



<220> 

<223> pMT/Biotag-DEST 
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<400> 5 
tcgcgcgttt 


cggtgatgac 


ggtgaaaacc 


tctgacacat 


gcagctcccg 


gagacggtca 


60 


cagcttgtct 


gtaagcggat 


gccgggagca 


gacaagcccg 


tcagggcgcg 


tcagcgggtg 


120 


ttggcgggtg 


tcggggctgg 


cttaactatg 


cggcatcaga 


gcagattgta 


ctgagagtgc 


180 


accatatgcg 


gtgtgaaata 


ccgcacagat 


gcgtaaggag 


aaaataccgc 


atcaggcgcc 


240 


attcgccatt 


caggctgcgc 


aactgttggg 


aagggcgatc 


ggtgcgggcc 


tcttcgctat 


300 


tacgccagct 


ggcgaaaggg 


ggatgtgctg 


caaggcgatt 


aagttgggta 


acgccagggt 


360 


tttcccagtc 


acgacgttgt 


aaaacgacgg 


ccagtgccag 


tgaattaatt 


cgttgcagga 


420 


caggatgtgg 


tgcccgatgt 


gactagctct 


ttgctgcagg 


ccgtcctatc 


ctctggttcc 


480 


gataagagac 


ccagaactcc 


ggccccccac 


cgcccaccgc 


cacccccata 


catatgtggt 


540 


acgcaagtaa 


gagtgcctgc 


gcatgcccca 


tgtgccccac 


caagagtttt 


gcatcccata 


600 


caagtcccca 


aagtggagaa 


ccgaaccaat 


tcttcgcggg 


cagaacaaaa 


gcttctgcac 


660 


acgtctccac 


tcgaatttgg 


agccggccgg 


cgtgtgcaaa 


agaggtgaat 


cgaacgaaag. 


720 


acccgtgtgt 


aaagccgcgt 


ttccaaaatg 


tataaaaccg 


agagcatctg 


gccaatgtgc 


780 


atcagttgtg 


gtcagcagca 


aaatcaagtg 


aatcatctca 


gtgcaactaa 


aggggggatc 


840 


tagcgtttaa 


acttaagctt 


accatgggcg 


ccggcacccc 


ggtgaccgcc 


ccgctggcgg 


900 


gcactatctg 


gaaggtgctg 


gccagcgaag 


gccagacggt 


ggccgcaggc 


gaggtgctgc 


960 


tgattctgga 


agccatgaag 


atggaaaccg 


aaatccgcgc 


cgcgcaggcc 


gggaccgtgc 


1020 


gcggtatcgc 


ggtgaaagcc 


ggcgacgcgg 


tggcggtcgg 


cgacaccctg 


atgaccctgg 


1080 


cgggctctgg 


atccgatctg 


tacgacgatg 


acgataaggt 


acatcaaaca 


agtttgtaca 


1140 


aaaaagctga 


acgagaaacg 


taaaatgata 


taaatatcaa 


tatattaaat 


tagattttgc 


1200 


ataaaaaaca 


gactacataa 


tactgtaaaa 


cacaacatat 


ccagtcacta 


tggcggccgc 


1260 


attaggcacc 


ccaggcttta 


cactttatgc 


ttccggctcg 


tataatgtgt 


ggattttgag 


1320 


ttaggatccg 


gcgagatttt 


caggagctaa 


ggaagctaaa 


atggagaaaa 


aaatcactgg 


1380 


atataccacc 


gttgatatat 


cccaatggca 


tcgtaaagaa 


cattttgagg 


catttcagtc 


1440 


agttgctcaa 


tgtacctata 


accagaccgt 


tcagctggat 


attacggcct 


ttttaaagac 


1500 


cgtaaagaaa 


aataagcaca 


agttttatcc 


ggcctttatt 


cacattcttg 


cccgcctgat 


1560 


gaatgctcat 


ccggaattcc 


gtatggcaat 


gaaagacggt 


gagctggtga 


tatgggatag 


1620 


tgttcaccct 


tgttacaccg 


ttttccatga 


gcaaactgaa 


acgttttcat 


cgctctggag 


1680 


tgaataccac 


gacgatttcc 


ggcagtttct 


acacatatat 


tcgcaagatg 


tggcgtgtta 


1740 


cggtgaaaac 


ctggcctatt 


tccctaaagg 


gtttattgag 


aatatgtttt 


tcgtctcagc 


1800 


caatccctgg gtgagtttca 


ccagttttga 


tttaaacgtg 


gccaatatgg 


acaacttctt 


1860 
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cgcccccgtt 


ttcaccatgg 


gcaaatatta 


tacQcaaaac 


cracaaqatQC 


tgatgecget 


1920 


ggcgattcag gttcatcatg ccgtctgtga 


taacttccat 

w W w W Np* LA w 


gtcggcagaa 


tgcttaatga 


1980 


attacaacag tactgcgatg agtggcaggg 


caaaacataa 


acacataaat 


ccggcttact 


2040 


aaaagccaga taacagtatg cgtatttgcg 


cactcacaaa 


ccoatatata 

^p* V* V*- ~J W W LA 


cccgaagtat 


2100 


gtcaaaaaga ggtgtgctat gaagcagcgt 


attacagtga 


cagttgacag 


cgacagctat 


2160 


cagttgctca 


aggcatatat 


gatgtcaata 


tctccggtct 


ggtaagcaca 


accatgeaga 


2220 


atgaagcccg 


tcgtctgcgt 


gccgaacgct 


qaaaagcqqa 


aaatcaggaa 


aqqatqqctQ 


2280 


aggtcgcccg gtttattgaa atgaacggct 


cttttactaa 

w L- w L_- ^-t n«* w U. 


caaaaacaaq 

Vn» V4 LA V*» LA 


gactggtgaa 


2340 


atgcagttta 


aggtttacac 


v_ L-ClL-ClClClCiyCl 


aaaaaccatt 

y ciy _3 ^*-* y *-* 


at cat eta tt 

LA L— V-* w W L— w 


tataaatata 


2400 


cagagtgata 


ttattgacac 






t CCCCC taOC 

^ \»» *^y y 


caatacacat 


2460 


ctgctgtcag 


ataaagtctc 


\— y i_ y act v_ i— i— 


t a r* c r* era t aa 


tacatatcoq 


aoataaaaac 


2520 


tggcgcatga 


tgaccaccga 


tatggccagt 


ataccaatct 


ccattatcaa 


aaaaaaaata 

yyddy «dv^ ^3 


2580 


gctgatctca gccaccgcga 


aaatgacatc 


aaaaacacca 

LA LA LA LA LA W V-* LA 


ttaacctaat 

W*- W tA V \^ VA W 


attctaaaaa 

y u -.v- L-yyyy c* 


2640 


atataaatgt 


caggctccgt 


tatacacagc 


>w ay u, v_. L-y v_ cxy 


atcaaccata 

y u> ^y CI w cl n 


ataactaaat 
y ~y ^ y _3 


2700 


atgttgtgtt 


ttacagtatt 


atgtagtctg 


ttttttatac 


aa aatctaat 


ttaatatatt 

V* <-A Vn* CA Vp# LA Vj*> W 


2760 


gatatttata 


tcattttacg 


tttctcgttc 


aant"t"1~r , t"t"a 


t acaaaataa 


taataattaa 

l_- ^-J CA w CA LA Lx Lp- LA LA 


2820 


ttaagatcta 


gagggcccgt 


ttaaacccgc 


t~ na raorr t 

i— . y a. i— um u 


caactatacc 

v_. y c% w wy y v_ 


ttctaaaatc 

L— L» V_- Lp- LA LA la l- V 


2 880 


cagacatgat 


aagatacatt 


gatgagtttg 


aacaaaccac 

LA UUU >— ' W 


aactaaaata 


cagtgaaaaa 


2940 


aatgctttat 


ttgtgaaatt 


tgtgatgcta 


ttactttatt 


tgtaaccat t 


ataagctgea 


3000 


ataaacaagt 


taacaacaac 


aattgcattc 


attttatatt 

LA L— W 1— ^— U L_* M l* i— 


tcaaattcaa 


aaaaaaatat 


3060 


gggaggtttt 


ttaaagcaag 


taaaacctct 


acaaatataa 

LA. W LA W LA L»* \— ^-j 


tataactaat 

LA W w ^-j UA Lp- 


tatgatcagt 


3120 


cgacctgcag 


gcatgcaagc 


ttggcgtaat 


catggtcata 


gctgtttcct 


gtgtgaaatt 


3180 


gttatccgct 


cacaattcca 


cacaacatac 


qaqccqqaaq 


cataaagtgt 


aaaqcctqqq 


3240 


gtgcctaatg 


agtgagctaa ctcacattaa 


ttQCQttacq 


ctcactgccc 


gctttccagt 


3300 


cgggaaacct 


gtcgtgccag 


ctgcattaat 


gaatcggcca 




aaaqqcqqtt 


3360 


tgcgtattgg gcgctcttcc gcttcctcgc 


tcactgactc 


actacactcq 


qtcqttcqqc 


3420 


tgcggcgagc 


ggtatcagct 


cactcaaagg 


cggtaatacg 


gttatccaca 


aaatcaqqqq 


3480 


ataacgcagg 


aaagaacatg 


tgagcaaaag 


gccagcaaaa 


ggecaggaac 


cgtaaaaagg 


3540 


ccgcgttgct 


ggcgtttttc 


cataggctcc 


gcccccctga 


cgagcatcac 


aaaaatcgac 


3600 


gctcaagtca gaggtggcga 


aacccgacag 


gactataaag 


ataccaggcg 


tttccccctg 


3660 


gaagctccct 


cgtgcgctct 


cctgttccga 


ccctgccgct 


taceggatae 


ctgtccgcct 


3720 
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ttctcccttc 


gggaagcgtg 


gcgctttctc 


atagctcacg 


ctgtaggtat 


ctcagttcgg 


3780 


tgtaggtcgt 


tcgctccaag 


ctgggctgtg 


tgcacgaacc 


ccccgttcag 


cccgaccgct 


3840 


gcgccttatc 


cggtaactat 


cgtcttgagt 


ccaacccggt 


aagacacgac 


ttatcgccac ' 


3900 


tggcagcagc 


cactggtaac 


aggattagca 


gagcgaggta 


tgtaggcggt 


gctacagagt 


3960 


tcttgaagtg 


gtggcctaac 


tacggctaca 


ctagaaggac 


agtatttggt 


atctgcgctc 


4020 


tgctgaagcc 


agttaccttc 


ggaaaaagag 


ttggtagctc 


ttgatccggc 


aaacaaacca 


4080 


ccgctggtag 


cggtggtttt 


tttgtttgca 


agcagcagat 


tacgcgcaga 


aaaaaaggat 


4140 


ctcaagaaga 


tcctttgatc 


ttttctacgg 


ggtctgacgc 


tcagtggaac 


gaaaactcac 


4200 


gttaagggat 


tttggtcatg 


agattatcaa 


aaaggatctt 


cacctagatc 


cttttaaatt 


4260 


aaaaatgaag 


ttttaaatca 


atctaaagta 


tatatgagta 


aacttggtct 


gacagttacc 


4320 


aatgcttaat 


cagtgaggca 


cctatctcag 


cgatctgtct 


atttcgttca 


tccatagttg 


4380 


cctgactccc 


cgtcgtgtag 


ataactacga 


tacgggaggg 


cttaccatct 


ggccccagtg 


4440 


ctgcaatgat 


accgcgagac 


ccacgctcac 


cggctccaga 


tttatcagca 


ataaaccagc 


4500 


cagccggaag 


ggccgagcgc 


agaagtggtc 


ctgcaacttt 


atccgcctcc 


atccagtcta 


4560 


ttaattgttg 


ccgggaagct 


agagtaagta 


gttcgccagt 


taatagtttg 


cgcaacgttg 


4620 


ttgccattgc 


tacaggcatc 


gtggtgtcac 


gctcgtcgtt 


tggtatggct 


tcattcagct 


4680 


ccggttccca 


acgatcaagg 


cgagttacat 


gatcccccat 


gttgtgcaaa 


aaagcggtta 


4740 


gctccttcgg 


tcctccgatc 


gttgtcagaa 


gtaagttggc 


cgcagtgtta 


tcactcatgg 


4800 


ttatggcagc 


actgcataat 


tctcttactg 


tcatgccatc 


cgtaagatgc 


ttttctgtga 


4860 


ctggtgagta 


ctcaaccaag 


tcattctgag 


aatagtgtat 


gcggcgaccg 


agttgctctt 


4920 


gcccggcgtc 


aatacgggat 


aataccgcgc 


cacatagcag 


aactttaaaa 


gtgctcatca 


4980 


ttggaaaacg 


ttcttcgggg 


cgaaaactct 


caaggatctt 


accgctgttg 


agatccagtt 


5040 


cgatgtaacc 


cactcgtgca 


cccaactgat 


cttcagcatc 


ttttactttc 


accagcgttt 


5100 


ctgggtgagc 


aaaaacagga 


aggcaaaatg 


ccgcaaaaaa 


gggaataagg 


gcgacacgga 


5160 


aatgttgaat 


actcatactc 


ttcctttttc 


aatattattg 


aagcatttat 


cagggttatt 


5220 


gtctcatgag 


cggatacata 


tttgaatgta 


tttagaaaaa 


taaacaaata 


9999ttccgc 


5280 


gcacatttcc 


ccgaaaagtg 


ccacctgacg 


tctaagaaac 


cattattatc 


atgacattaa 


5340 


cctataaaaa 


taggcgtatc 


acgaggccct 


ttcgt 






5375 



<210> 6 

<211> 72 

<212> PRT 

<213> Klebsiella pneumoniae 
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<400> 6 

Gly Ala Gly Thr Pro Val Thr Ala Pro Leu Ala Gly Thr lie Trp Lys 
1 5 10 15 



Val Leu Ala Ser Glu Gly Gin Thr Val Ala Ala Gly Glu Val Leu Leu 
20 25 30 



He Leu Glu Ala Met Lys Met Glu Thr Glu He Arg Ala Ala Gin Ala 
35 40 45 



Gly Thr Val Arg Gly He Ala Val Lys Ala Gly Asp Ala Val Ala Val 
50 55 60 



Gly Asp Thr Leu Met Thr Leu Ala 
65 70 



<210> 7 

<211> 115 

<212> PRT 

<213> Mus musculus 

<400> 7 

Lys Ala Leu Ala Val Ser Asp Leu Asn Arg Ala Gly Gin Arg Gin Val 
15 10 15 



Phe Phe Glu Leu Asn Gly Gin Leu Arg Ser He Leu Val Lys Asp Thr 
20 . 25 30 



Gin Ala Met Lys Glu Met His Phe His Pro Lys Ala Leu Lys Asp Val 
35 40 45 



Lys Gly Gin lie Gly Ala Pro Met Pro Gly Lys Val lie Asp He Lys 
50 * 55 60 



Val Ala Ala Gly Asp Lys Val Ala Lys Gly Gin Pro Leu Cys Val Leu 
65 70 75 80 



Ser Ala Met Lys Met Glu Thr Val Val Thr Ser Pro Met Glu Gly Thr 
85 90 95 



He Arg Lys Val His Val Thr Lys Asp Met Thr Leu Glu Gly Asp Asp 
100 105 110 



Leu lie Leu 
115 



<210> 8 



-20- 



<211> 123 
<212> PRT 

<213> Propionibacterium shermanii 
<400> 8 

Met Lys Leu Lys Val Thr Val Asn Gly Thr Ala Tyr Asp Val Asp Val 
15 10 15 



Asp Val Asp Lys Ser His Glu Asn Pro Met Gly Thr He Leu Phe Gly 
20 25 30 



Gly Gly Thr Gly Gly Ala Pro Ala Pro Arg Ala Ala Gly Gly Ala Gly 
35 40 45 



Ala Gly Lys Ala Gly Glu Gly Glu He Pro Ala Pro Leu Ala Gly Thr 
50 55 60 



Val Ser Lys He Leu Val Lys Glu Gly Asp Thr Val Lys Ala Gly Gin 
65 70 75 80 



Thr Val Leu Val Leu Glu Ala Met Lys Met Glu Thr Glu He Asn Ala 
85 90 95 



Pro Thr Asp Gly Lys Val Glu Lys Val Leu Val Lys Glu Arg Asp Ala 
100 105 HO 



Val Gin Gly Gly Gin Gly Leu He Lys He Gly 
115 120 



<210> 9 

<211> 122 

<212> PRT 

<213> Homo sapiens 

<400> 9 

Gly Ser Cys Val Glu Val Asp Val His Arg Leu Ser Asp Gly Gly Leu 
15 10 15 



Leu Leu Ser Tyr Asp Gly Ser Ser Tyr Thr Thr Tyr Met Lys Glu Glu 
20 ~ 25 30 



Val Asp Arg Tyr Arg He Thr He Gly Asn Lys Thr Cys Val Phe Glu 
35 40 45 



Lys Glu Asn Asp Pro Ser Val Met Arg Ser Pro Ser Ala Gly Lys Leu 
50 55 60 



lie Gin Tyr He Val Glu Asp Gly Gly His Val Phe Ala Gly Gin Cys 
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65 70 75 80 



Tyr Ala Glu lie Glu Val Met Lys Met Val Met Thr Leu Thr Ala Val 
85 90 95 



Glu Ser Gly Cys lie His Tyr Val Lys Arg Pro Gly Ala Ala Leu Asp 
100 105 110 



Pro Gly Cys Val Leu Ala Lys Met Gin Leu 
115 120 



<210> 10 

<211> 156 

<212> PRT 

<213> Escherichia coli 

<400> 10 

Met Asp lie Arg Lys lie Lys Lys Leu lie Glu Leu Val Glu Glu Ser 
1 5 10 15 



Gly lie Ser Glu Leu Glu lie Ser Glu Gly Glu Glu Ser Val Arg lie 
20 25 30 



Ser Arg Ala Ala Pro Ala Ala Ser Phe Pro Val Met Gin Gin Ala Tyr 
35 40 45 



Ala Ala Pro Met Met Gin Gin Pro Ala Gin Ser Asn Ala Ala Ala Pro 
50 55 60 



Ala Thr Val Pro Ser Met Glu Ala Pro Ala Ala Ala Glu He Ser Gly 
65 70 75 , 80 



His He Val Arg Ser Pro Met Val Gly Thr Phe Tyr Arg Thr Pro Ser 
85 90 95 



Pro Asp Ala Lys Ala Phe He Glu Val Gly Gin Lys Val Asn Val Gly 
100 105 110 



Asp Thr Leu Cys He Val Giu Ala Met Lys Met Met Asn Gin He Glu 
115 120 125 



Ala Asp Lys Ser Gly Thr Val Lys Ala He Leu Val Glu Ser Gly Gin 
130 135 140 



Pro Val Glu Phe Asp Glu Pro Leu Val Val lie Glu 
145 150 155 
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<210> 11 

<211> 216 

<212> DNA 

<213> Klebsiella pneumoniae 

<400> 11 



ggcgccggca 


ccccggtgac 


cgccccgctg gcgggcacta 


tctggaaggt 


gctggccagc 


60 


gaaggccaga 


cggtggccgc 


aggcgaggtg 


ctgctgattc 


tggaagccat 


gaagatggaa 


120 


accgaaatcc 


gcgccgcgca 


ggccgggacc 


gtgcgcggta 


tcgcggtgaa 


agccggcgac 


180 


gcggtggcgg 


tcggcgacac 


cctgatgacc 


ctggcg 






216 


<210> 12 
<211> 345 
<212> DNA 
<213> Mus 


musculus 












<400> 12 
aaagccctgg 


ctgtaagcga 


cctgaaccgt 


gctggccaga 


ggcaggtgtt 


ctttgaactc 


60 


aatgggcagc 


ttcgatccat 


tctggttaaa 


gacacccagg 


ccatgaagga 


gatgcacttc. 


120 


catcccaagg 


ctttgaagga 


tgtgaagggc 


caaattgggg 


ccccgatgcc 


tgggaaggtc 


180 


atagacatca 


aggtggcagc 


aggggacaag 


gtggctaagg 


gccagcccct 


ctgtgtgctc 


240 


agcgccatga 


agatggagac 


tgtggtgact 


tcgcccatgg 


agggcactat 


ccgaaaggtt 


3 00 


catgttacca 


aggacatgac 


tctggaaggc gacgacctca 


tccta 




345 



<210> 13 
<211> 369 
<212> DNA 

<213> Propionibacterium shermanii 
<400> 13 

atgaaactga aggtaacagt caacggcact gcgtatgacg ttgacgttga cgtcgacaag 60 

tcacacgaaa acccgatggg caccatcctg ttcggcggcg gcaccggcgg cgcgccggca 120 

ccgcgcgcag caggtggcgc aggcgccggt aaggccggag agggcgagat tcccgctccg 180 

ctggccggca ccgtctccaa gatcctcgtg aaggagggtg acacggtcaa ggctggtcag 240 

accgtgctcg ttctcgaggc catgaagatg gagaccgaga tcaacgctcc caccgacggc 3 00 

aaggtcgaga aggtccttgt caaggagcgt gacgccgtgc agggcggtca gggtctcatc 3 60 

aagatcggc 3 69 



<210> 14 

<211> 366 

<212> DNA 

<213> Homo sapiens 

<400> 14 

ggctcatgtg tagaagtaga tgtacatcgg ctgagtgacg gtggactgct cttgtcctat. 60 
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gatggcagca gttacaccac gtatatgaag gaggaagtag acagatatcg catcacaatt 12 0 

ggcaataaaa cctgtgtgtt tgagaaggaa aatgacccat cggtgatgcg ctcaccttct 180 

gctgggaagt taatccagta cattgtagaa gatggaggtc atgtgtttgc cggccagtgc 240 

tatgcagaga ttgaggtaat gaagatggta atgactttga cagctgtgga gtctggctgt 300 

atccattacg tcaagcgtcc tggagcagct cttgaccctg gctgtgtact cgccaaaatg 360 

caactg 3 66 



<210> 15 
<211> 468 
<212> DNA 

<213> Escherichia coli 
<400> 15 

atggatattc gtaagattaa aaaactgatc gagctggttg aagaatcagg catctccgaa 60 
ctggaaattt ctgaaggcga agagtcagta cgcattagcc gtgcagctcc tgccgcaagt 12 0 
ttccctgtga tgcaacaagc ttacgctgca ccaatgatgc agcagccagc tcaatctaac 18 0 
gcagccgctc cggcgaccgt tccttccatg gaagcgccag cagcagcgga aatcagtggt 240 
cacatcgtac gttccccgat ggttggtact ttctaccgca ccccaagccc ggacgcaaaa 300 
gcgttcatcg aagtgggtca gaaagtcaac gtgggcgata ccctgtgcat cgttgaagcc 360 
atgaaaatga tgaaccagat cgaagcggac aaatccggta ccgtgaaagc aattctggtc 420 
gaaagtggac aaccggtaga atttgacgag ccgctggtcg tcatcgag 468 



<210> 16 

<211> 8 

<212> PRT 

<213> Artificial 

<220> 

<223> FLAG epitope 

<400> 16 

Asp Tyr Lys Asp Asp Asp Asp Lys 
1 5 



<210> 17 

<211> 8 

<212> PRT 

<213> Artificial 

<220> 

<223> FLAG epitope 

<400> 17 



Asp Tyr Lys Asp Glu Asp Asp Lys 
1 5 



-24- 



<210> 18 

<211> 9 

<212> PRT 

<213> Artificial 

<220> 

<223> Strep epitope 

<400> 18 

Ala Trp Arg His Pro Gin Phe Gly Gly 
1 5 



<210> 


19 


<211> 


11 


<212> 


PRT 


<213> 


Artificial 


<220> 




<223> 


VSV-G epitope 


<400> 


19 



Tyr Thr Asp lie Glu Met Asn Arg Leu Gly Lys 
15 10 



<210> 20 
<211> 6 
<212> PRT 
<213>_ Artificial 

<220> 

<223> poly-His epitope 
<400> 20 

His His His His His His 
1 5 



<210> 21 

<211> 13 

<212> PRT 

<213> Artificial 

<220> 

<223> Influenza epitope 

<400> 21 

Tyr Pro Tyr Asp Val Pro Asp Tyr Ala lie Glu Gly Arg 
1 5 10 



<210> 22 

<211> 11 

<212> PRT 

<213> Artificial 



-25- 

<220> 

<223> Human c-myc epitope 
<400> 22 

Glu Gin Lys Leu Leu Ser Glu Glu Asp Leu Asn 
15 10 



<210> 23 

<211> 3 

<212> PRT 

<213> Artificial 

<220> 

<223> tripeptide epitope 

<400> 23 

Glu Glu Phe 
1 



<210> 24 

<211> 5 

<212> PRT 

<213> Artificial 

<220> 

<223> enterokinase (EK) recognition site 

<400> 24 

Asp Asp Asp Asp Lys 
1 - 5 



<210> 25 

<211> 467 

<212> DNA 

<213> Artificial 

<220> 

<223> pET104-DEST vector 



<220> 

<221> CDS 

<222> (177) . . (464) 

<220> 

<221> misc_feature 

<222> (466) . . (467) 

<223> n is a, c, g, or t 

<400> 25 

ataggcgcca gcaaccgcac ctgtggcgcc ggtgatgccg gccacgatgc gtccggcgta 

gaggatcgag atctcgatcc cgcgaaatta atacgactca ctatagggga attgtgagcg 

gataacaatt cccctctaga aataattttg tttaacttta agaaggagat atacat atg 

Met 



-26- 



ggc gcc ggc acc ccg gtg acc gcc ccg ctg gcg ggc act ate tgg aag 

Gly Ala Gly Thr Pro Val Thr Ala Pro Leu Ala Gly Thr lie Trp Lys 
5 10 15 

gtg ctg gcc age gaa ggc cag acg gtg gcc gca ggc gag gtg ctg ctg 

Val Leu Ala Ser Glu Gly Gin Thr Val Ala Ala Gly Glu Val Leu Leu 
20 25 30 

att ctg gaa gcc atg aag atg gaa acc gaa ate cgc gcc gcg cag gcc 

lie Leu Glu Ala Met Lys Met Glu Thr Glu He Arg Ala Ala Gin Ala 
35 40 45 

ggg acc gtg cgc ggt ate gcg gtg aaa gcc ggc gac gcg gtg gcg gtc 

Gly Thr Val Arg Gly He Ala Val Lys Ala Gly Asp Ala Val Ala Val 

50 55 60 65 

ggc gac acc ctg atg acc ctg gcg ggc tct gga tec gat ctg tac gac 

Gly Asp Thr Leu Met Thr Leu Ala Gly Ser Gly Ser Asp Leu Tyr Asp 
70 75 80 

gat gac gat aag gga att ate aca agt ttg tac aaa aaa gca ggc tnn 
Asp Asp Asp Lys Gly He He Thr Ser Leu Tyr Lys Lys Ala Gly 

90 95 





85 


<210> 


26 


<211> 


96 


<212> 


PRT 


<213> 


Artificial 


<220> 




<223> 


pET104-DEST vector . 


<400> 


26 


Met Gly Ala Gly Thr Pro Val 


1 


5 



10 15 



Lys Val Leu Ala Ser Glu Gly Gin Thr Val Ala Ala Gly Glu Val Leu 
20 25 30 



Leu He Leu Glu Ala Met Lys Met Glu Thr Glu He Arg Ala Ala Gin 
35 40 45 



Ala Gly Thr Val Arg Gly He Ala Val Lys Ala Gly Asp Ala Val Ala 
50 55 60 



Val Gly Asp Thr Leu Met Thr Leu Ala Gly Ser Gly Ser Asp Leu Tyr 
65 70 75 80 



Asp Asp Asp Asp Lys Gly He He Thr Ser Leu Tyr Lys Lys Ala Gly 
85 90 95 



-27- 



<210> 27 

<211> 449 

<212> DNA 

<213> Artificial 

<220> 

<223> pET104/D-TOPO vector 



<220> 

<221> CDS 

<222> (177) . . (449) 

<400> 27 

ataggcgcca gcaaccgcac ctgtggcgcc ggtgatgccg gccacgatgc gtccggcgta 60 

gaggatcgag atctcgatcc cgcgaaatta atacgactca ctatagggga attgtgagcg 120 

gataacaatt cccctctaga aataattttg tttaacttta agaaggagat atacat atg 179 

Met 
1 

ggc gcc ggc acc ccg gtg acc gcc ccg ctg gcg ggc act ate tgg aag 227 
Gly Ala Gly Thr Pro Val Thr Ala Pro Leu Ala Gly Thr lie Trp Lys 
5 10 15 

gtg ctg gcc age gaa ggc cag acg gtg gcc gca ggc gag gtg ctg ctg 275 
Val Leu Ala Ser Glu Gly Gin Thr Val Ala Ala Gly Glu Val Leu Leu 
20 25 30 

att ctg gaa gcc atg aag atg gaa acc gaa ate cgc gcc gcg cag gcc 323 
He Leu Glu Ala Met Lys Met Glu Thr Glu He Arg Ala Ala Gin Ala 
35 40 45 

ggg acc gtg cgc ggt ate gcg gtg aaa gcc ggc gac gcg gtg gcg gtc 371 
Gly Thr Val Arg Gly He Ala Val Lys Ala Gly Asp Ala Val Ala Val 
50 55 60 65 

ggc gac acc ctg atg acc ctg gcg ggc tct gga tec gat ctg tac gac 419 
Gly Asp Thr Leu Met Thr Leu Ala Gly Ser Gly Ser Asp Leu Tyr Asp 
70 75 80 

gat gac gat aag gga att gat ccc ttc acc 44 9 

Asp Asp Asp Lys Gly He Asp Pro Phe Thr 
85 90 



<210> 28 

<211> 91 

<212> PRT 

<213> Artificial 

<220> 

<223> pET104/D-TOPO vector 
<400> 28 

Met Gly Ala Gly Thr Pro Val Thr Ala Pro Leu Ala Gly Thr He Trp 
15 10 15 



Lys Val Leu Ala Ser Glu Gly Gin Thr Val Ala Ala Gly Glu Val Leu 



-28- 



20 



25 



30 



Leu lie Leu Glu Ala Met Lys Met Glu Thr Glu lie Arg Ala Ala Gin 
35 40 45 



Ala Gly Thr Val Arg Gly He Ala Val Lys Ala Gly Asp Ala Val Ala 
50 55 60 



Val Gly Asp Thr Leu Met Thr Leu Ala Gly Ser Gly Ser Asp Leu Tyr 
65 70 75 80 



Asp Asp Asp Asp Lys Gly He Asp Pro Phe Thr 
85 90 



<210> 29 

<211> 450 

<212> DNA 

<213> Artificial 

<220> 

<223> pcDNA/Biotag-DEST vector 



<22 0> 

<221> CDS 

<222> (160) . . (447) 

<220> 

<221> misc_feature 

<222> (449) . . (450) 

<223> n is a, c, g, or t 

<400> 29 

cccattgacg caaatgggcg gtaggcgtgt acggtgggag gtctatataa gcagagctct 60 

ctggctaact agagaaccca ctgcttactg gcttatcgaa attaatacga ctcactatag 120 

ggagacccaa gctggctagc gtttaaactt aagcttacc atg ggc gcc ggc acc 174 

Met Gly Ala Gly Thr 
1 5 

ccg gtg acc gcc ccg ctg gcg ggc act ate tgg aag gtg ctg gcc age 222 
Pro Val Thr Ala Pro Leu Ala Gly Thr He Trp Lys Val Leu Ala Ser 
10 15 20 

gaa ggc cag acg gtg gcc gca ggc gag gtg ctg ctg att ctg gaa gcc 270 
Glu Gly Gin Thr Val Ala Ala Gly Glu Val Leu Leu lie Leu Glu Ala 
25 30 35 

atg aag atg gaa acc gaa ate cgc gcc gcg cag gcc ggg acc gtg cgc 318 
Met Lys Met Glu Thr Glu He Arg Ala Ala Gin Ala Gly Thr Val Arg 
40 45 50 

ggt ate gcg gtg aaa gcc ggc gac gcg gtg gcg gtc ggc gac acc ctg 3 66 

Gly He Ala Val Lys Ala Gly Asp Ala Val Ala Val Gly Asp Thr Leu 



-29- 

55 60 65 

atg acc ctg gcg ggc tct gga tec gat ctg tac gac gat gac gat aag 
Met Thr Leu Ala Gly Ser Gly Ser Asp Leu Tyr Asp Asp Asp Asp Lys 
70 75 80 85 

gta cat caa aca agt ttg tac aaa aaa gca ggc tnn 
Val His Gin Thr Ser Leu Tyr Lys Lys Ala Gly 
90 95 



<210> 30 

<211> 96 

<212> PRT 

<213> Artificial 

<220> 

<223> pcDNA/Biotag-DEST vector 
<400> 30 

Met Gly Ala Gly Thr Pro Val Thr Ala Pro Leu Ala Gly Thr lie Trp 
15 10 15 



Lys Val Leu Ala Ser Glu Gly Gin Thr Val Ala Ala Gly Glu Val Leu 
20 25 30 



Leu He Leu Glu Ala Met Lys Met Glu Thr Glu lie Arg Ala Ala Gin 
35 40 45 



Ala Gly Thr Val Arg Gly He Ala Val Lys Ala Gly Asp Ala Val Ala 
50 55 60 



Val Gly Asp Thr Leu Met Thr Leu Ala Gly Ser Gly Ser Asp Leu Tyr 
65 ~ 70 75 80 



Asp Asp Asp Asp Lys Val His Gin Thr Ser Leu Tyr Lys Lys Ala Gly 
85 90 95 



<210> 31 

<211> 453 

<212> DNA 

<213> Artificial 



<220> 

<223> pcDNA6/Biotag/D-TOPO 



414 



450 



<220> 

<221> CDS 

<222> (160) . . (453) 



<400> 31 

cccattgacg caaatgggcg gtaggcgtgt acggtgggag gtctatataa gcagagctct 



ctggctaact agagaaccca ctgcttactg gcttatcgaa attaatacga ctcactatag 



60 
120 



-30- 



ggagacccaa gctggctagc gtttaaactt aagcttacc atg ggc gcc ggc acc 

Met Gly Ala Gly Thr 
1 5 

ccg gtg acc gcc ccg ctg gcg ggc act ate tgg aag gtg ctg gcc age 
Pro Val Thr Ala Pro Leu Ala Gly Thr lie Trp Lys Val Leu Ala Ser 
10 15 20 

gaa ggc cag acg gtg gcc gca ggc gag gtg ctg ctg att ctg gaa gcc 
Glu Gly Gin Thr Val Ala Ala Gly Glu Val Leu Leu He Leu Glu Ala 
25 30 35 

atg aag atg gaa acc gaa ate cgc gcc gcg cag gcc ggg acc gtg cgc 
Met Lys Met Glu Thr Glu He Arg Ala Ala Gin Ala Gly Thr Val Arg 
40 45 50 

ggt ate gcg gtg aaa gcc ggc gac gcg gtg gcg gtc ggc gac acc ctg 
Gly He Ala Val Lys Ala Gly Asp Ala Val Ala Val Gly Asp Thr Leu 
55 60 65 

atg acc ctg gcg ggc tct gga tec gat ctg tac gac gat gac gat aag 
Met Thr Leu Ala Gly Ser Gly Ser Asp Leu Tyr Asp Asp Asp Asp Lys 
70 75 80 85 

gta cct agg ate cag tgt ggt gga att gat ccc ttc acc 
Val Pro Arg He Gin Cys Gly Gly He Asp Pro Phe Thr 
90 95 



<210> 32 

<211> 98 

<212> PRT 

<213> Artificial 

<220> 

<223> pcDNA6/Biotag/D-TOPO 

<400> 32 

Met Gly Ala Gly Thr Pro Val Thr Ala Pro Leu Ala Gly Thr He Trp 
1 5 10 15 



Lys Val Leu Ala Ser Glu Gly Gin Thr Val Ala Ala Gly Glu Val Leu 
20 25 30 



Leu He Leu Glu Ala Met Lys Met Glu Thr Glu He Arg Ala Ala Gin 
35 40 45 

Ala Gly Thr Val Arg Gly He Ala Val Lys Ala Gly Asp Ala Val Ala 
50 55 60 



Val Gly Asp Thr Leu Met Thr Leu Ala Gly Ser Gly Ser Asp Leu Tyr 
65 70 75 80 



Asp Asp Asp Asp Lys Val Pro Arg He Gin Cys Gly Gly He Asp Pro 
85 90 95 



-31- 



Phe Thr 



<210> 33 

<211> 744 

<212> DNA 

<213> Artificial 

<220> 

<223> pMT/Biotag-DEST vector 



<220> 

<221> CDS 

<222> (454) . . (741) 

<220> 

<221> misc_feature 

<222> (743) . . (744) 

<223> n is a, c, g, or t 

<400> 33 



cgttgcagga 


caggatgtgg 


tgcccgatgt 


gactagctct 


ttgctgcagg 


ccgtcctatc 


60 


ctctggttcc 


gataagagac 


ccagaactcc 


ggccccccac 


cgcccaccgc 


cacccccata 


120 


catatgtggt 


acgcaagtaa 


gagtgcctgc 


gcatgcccca 


tgtgccccac 


caagagtttt 


180 


gcatcccata 


caagtcccca 


aagtggagaa 


ccgaaccaat 


tcttcgcggg 


cagaacaaaa 


240 


gcttctgcac 


acgtctccac 


tcgaatttgg 


agccggccgg 


cgtgtgcaaa 


agaggtgaat 


300 


cgaacgaaag 


acccgtgtgt 


aaagccgcgt 


ttccciaaatg 


tataaaaccg 


agagcatctg 


360 


gccaatgtgc 


atcagttgtg 


gtcagcagca 


aaatcaagtg 


aatcatctca 


gtgcaactaa 


420 


aggggggatc 


tagcgtttaa 


acttaagctt 


acc atg ggc gcc ggc acc ccg gtg 


474 



Met Gly Ala Gly Thr Pro Val 
1 5 

acc gcc ccg ctg gcg ggc act ate tgg aag gtg ctg gcc age gaa ggc 522 
Thr Ala Pro Leu Ala Gly Thr lie Trp Lys Val Leu Ala Ser Glu Gly 
10 15 20 

cag acg gtg gcc gca ggc gag gtg ctg ctg att ctg gaa gcc atg aag 570 
Gin Thr Val Ala Ala Gly Glu Val Leu Leu He Leu Glu Ala Met Lys 
25 30 35 

atg gaa acc gaa ate cgc gcc gcg cag gcc ggg acc gtg cgc ggt ate 618 
Met Glu Thr Glu He Arg Ala Ala Gin Ala Gly Thr Val Arg Gly He 
40 45 50 55 

gcg gtg aaa gcc ggc gac gcg gtg gcg gtc ggc gac acc ctg atg acc 666 
Ala Val Lys Ala Gly Asp Ala Val Ala Val Gly Asp Thr Leu Met Thr 
60 65 70 

ctg gcg ggc tct gga tec gat ctg tac gac gat gac gat aag gta cat 714 
Leu Ala Gly Ser Gly Ser Asp Leu Tyr Asp Asp Asp Asp Lys Val His 
75 80 85 

caa aca agt ttg tac aaa aaa gca ggc tnn 744 



-32- 



Gin Thr Ser Leu Tyr Lys Lys Ala Gly 
90 95 



<210> 34 

<211> 96 

<212> PRT 

<213> Artificial 

<220> 

<223> pMT/Biotag-DEST vector 

<400> 34 



Met Gly Ala Gly Thr Pro Val Thr Ala Pro Leu Ala Gly Thr lie Trp 
1 5 10 15 



Lys Val Leu Ala Ser Glu Gly Gin Thr Val Ala Ala Gly Glu Val Leu 
20 25 30 



Leu lie Leu Glu Ala Met Lys Met Glu Thr Glu lie Arg Ala Ala Gin 
35 40 45 



Ala Gly Thr Val Arg Gly He Ala Val Lys Ala Gly Asp Ala Val Ala 
50 55 60 



Val Gly Asp Thr Leu Met Thr Leu Ala Gly Ser Gly Ser Asp Leu Tyr 
65 70 • 75 80 



Asp Asp Asp Asp Lys Val His Gin Thr Ser Leu Tyr Lys Lys Ala Gly 
85 90 95 



